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INTRODUCTION 

The  Nature  of  the  Study 

The  research  presented  in  these  pages  is  concerned  with  the 
development  of  a  means  of  testing  musical  aptitude.  The  study  is 
based  on  the  thesis  that  the  ability  to  differentiate  between  musical 
intervals  may  be  closely  related  to  important  complex  thought  proc- 
esses of  a  tonal  nature  which  are  brought  into  play  in  the  under- 
standing and  performance  of  musical  values.  In  order  to  validate 
this  thesis  it  has  been  necessary  first  to  develop  an  exploratory  test 
instrument  to  measure  this  function,  since  no  completely  satisfac- 
tory test  of  the  function  exists  for  the  purpose.  The  ultimate 
objective  of  the  study  is  to  determine  the  extent  to  which  a  measure 
of  this  function  is  related  to  some  important  criteria  of  musical 
ability,  and  to  note  the  power  of  this  exploratory  instrument  of 
measurement  to  differentiate  between  groups  on  the  basis  of  musical 
criteria. 

Much  more  has  been  involved,  however,  than  the  construction  of 
a  test  followed  by  the  usual  validation  procedures.  Music  being  the 
intangible  and  subjective  experience  that  it  is,  an  investigation  of 
this  field  necessitated  preliminary  work  of  an  exploratory  nature 
before  an  objective  experimental  design  could  be  put  into  effect. 
There  is  one  concept,  however,  which  remained  constant  throughout 
the  entire  course  of  the  investigation.  This  concept  consists  of  a 
definite  philosophy  as  to  the  basic  nature  of  all  forms  of  musical 
thought  and  experience.  The  maintenance  of  this  point  of  depar- 
ture made  it  possible  to  enter  upon  a  series  of  exploratory  studies 
with  some  confidence  that  in  the  end  appreciable  and  valid  results 
might  be  obtained. 

Preliminary  work  was  in  the  nature  of  probing,  with  the  help 
of  objective  and  subjective  data,  in  an  effort  to  seize  upon  promising 
clues  as  they  presented  themselves.  A  report  of  this  work  is  in- 
cluded not  only  to  lend  support  and  justification  to  succeeding  steps 
of  the  study,  but  also  to  indicate  the  manner  in  which  the  investiga- 
tion developed,  for  its  possible  aid  to  other  research  workers  who 
may  wish  to  work  on  similar  problems. 

There  are  a  number  of  crucial  points  on  which  the  entire  investi- 
gation pivots.  First  is  the  need  for  justifying  the  selection  of  inter- 
val awareness  or  discrimination  as  a  function  likely  to  show  relation- 
ship to  important  musical  criteria.     The  function  must  show  a  cer- 
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tain  psychological  relationship  to  musical  thought  processes  in  keep- 
ing with  the  point  of  view  of  the  study  on  the  fundamental  nature 
of  musical  meaning. 

Second  is  the  necessity  for  securing  a  valid,  psychological,  and 
unitary  response  on  the  function  to  be  studied.  To  be  an  effective 
response  for  use  in  measurement  it  must  represent  a  perceptual  abil- 
ity associated  with  this  function  alone,  and  not  be  influenced  by 
any  of  the  more  complex  patterns  of  the  structure  of  music.  Third, 
the  unitary  response  so  determined  must  be  cast  into  some  objective 
form  so  that  it  may  serve  as  a  reliable  means  of  testing.  Moreover, 
certain  patterns  of  test  construction  need  study  so  that  the  greatest 
possible  validity  may  be  given  the  final  instrument  of  measurement. 
The  last  point  is  the  matter  of  selecting  significant  and  important 
criteria  of  musical  behavior  for  the  purpose  of  validating  and  other- 
wise testing  the  performance  of  the  exploratory  instrument  of  mea- 
surement. 

Not  until  each  pivotal  point  is  made  as  secure  as  possible  can 
the  study  proceed  to  the  examination  of  the  main  thesis,  which  is  to 
ascertain  the  extent  to  which  a  measure  of  the  function  of  interval 
discrimination  can  serve  as  an  index  of  the  larger,  more  important 
aspects  of  musical  ability. 

Division  of  the  Study 

The  account  of  this  research  is  divided  into  five  parts,  with  a 
chapter  devoted  to  each.  The  first  division  presents  a  frame  of 
reference.  Included  in  this  chapter  is  a  brief  review  of  the  general 
need  for  testing  in  music  education.  The  selection  of  the  function 
of  interval  discrimination  as  an  index  of  musical  aptitude  is  dis- 
cussed. The  nature  of  musicality  is  reviewed,  and  the  psychologi- 
cal relationships  between  the  function  of  interval  awareness  and 
more  highly  organized  patterns  of  musical  behavior  are  considered. 

The  second  chapter  is  concerned  with  the  necessary  preliminary 
work  of  ascertaining  a  unitary  psychological  response  to  the  func- 
tion of  interval  discrimination.  Results  of  group  and  individual 
testing  are  presented  to  support  the  selection  of  the  specific  man- 
ner of  securing  the  response  ultimately  used  as  a  basis  for  final  test- 
ing. Some  tentative  reliability  and  validity  coefficients  on  experi- 
mental test  forms  are  reported. 

The  third  chapter  describes  the  work  of  obtaining  greater  preci- 
sion in  the  measurement  of  the  function  and  of  developing  a  test 
suitable  for  carrying  out  the  purpose  of  the  investigation.     An 


MEASURE  OF  MUSICAL  APTITUDE  7 

experimental  design  was  devised,  using  the  specific  manner  of  re- 
sponse which  the  preliminary  psychological  inquiry  developed  for 
use  in  testing.  This  design,  incorporating  a  number  of  patterns  of 
response,  was  made  a  part  of  several  experimental  test  forms  and 
administered  to  a  selection  of  elementary  and  secondary  school 
pupils.  From  the  results  of  this  testing  the  apparent  effect  on  item 
validity  of  some  of  the  patterns  of  response  entering  into  the  experi- 
mental design  are  studied.  The  manner  in  which  these  data  have 
been  used  in  the  building  of  a  final  test  form  is  then  reported. 

The  fourth  chapter  is  devoted  to  various  aspects  of  reliability 
and  validity  of  the  final  test  developed  for  the  study.  An  attempt 
is  also  made  to  evaluate  the  findings  in  the  light  of  the  results  of 
other  studies  in  music  testing.  Scores  on  the  test  are  correlated 
with  selected  criteria,  and  a  study  of  group  differences  is  made  to 
determine  the  differentiating  power  of  the  test.  Relationships  of 
the  test  with  certain  standardized  music  tests  are  reported. 

The  fifth  and  final  chapter  discusses  the  nature  and  significance 
of  the  intervalic  function  in  its  relation  to  larger  patterns  of  be- 
havior in  music.  The  contribution  to  this  knowledge  of  the  findings 
of  this  study  are  discussed.  Concluding  the  entire  work  is  a  dis- 
cussion of  possible  uses  of  the  method  of  testing  developed  in  this 
study  in  the  field  of  education  and  research. 

Sources  of  Material 

Test  data  and  other  material  were  obtained  through  the  coopera- 
tion of  public  schools,  colleges,  and  music  conservatories.  Prelimin- 
ary measurements  were  carried  out  at  P.S.  165  and  P.S.  101  of  the 
New  York  City  public  schools.  Several  secondary  schools  in  the 
city  of  Mt.  Vernon,  New  York,  were  also  tested.  Final  testing  and 
assembling  of  criteria  of  musical  ability  on  the  secondary  school 
level  was  done  at  the  two  Horace  Mann  Schools  of  Teachers  College, 
Columbia  University,  New  York  City,  and  the  High  School  of  Music 
and  Art,  New  York  City.  Additional  secondary  schools  tested 
were  Teaneck  and  Bergenfield  in  New  Jersey,  and  Virginia, 
Minnesota. 

On  the  college  level,  selected  and  unselected  groups  of  students 
at  the  New  Jersey  State  Teachers  College  at  Jersey  City  were  tested. 
Test  data  and  various  criteria  on  the  conservatory  and  graduate 
levels  were  obtained  from  the  Juilliard  School  of  Music,  New  York 
City,  and  the  Music  Department  of  Teachers  College,  Columbia 
University. 


CHAPTEE  I 

ESTABLISHING  A  FRAME  OF  REFERENCE 

Stimulus  for  the  Work 

The  stimulus  and  inspiration  for  this  research  have  developed 
from  several  sources.  First,  and  probably  foremost,  has  been  an 
awareness  of  a  distinct  need  in  music  teaching  for  more  accurate 
means  of  appraising  and  testing  ability  in  music.  The  increasing 
importance  of  instrumental  and  vocal  music  in  the  secondary  schools, 
and,  in  recent  years,  the  mounting  stress  upon  music  as  a  cultural 
pursuit  in  general  education,  have  increased  the  need  for  and  the 
value  of  music  testing  in  schools  and  colleges.  The  success  of  test- 
ing techniques  and  methods  in  other  fields,  moreover,  has  lent  en- 
couragement to  the  belief  that  results  equally  valuable  and  far- 
reaching  might  be  developed  in  the  field  of  music. 

In  the  field  of  music  testing,  however,  all  has  not  been  serene. 
Some  of  the  assumptions  involved  in  music  testing  have  been  strongly 
contested,  and  an  examination  of  the  pages  of  the  Music  Educators 
Journal  from  1936  to  1939  reveals  the  extent  to  which  the  editors 
of  that  publication  have  attempted  not  only  to  acquaint  its  readers 
with  the  values  of  music  testing,  but  to  present  conflicting  issues  in 
an  impartial  manner.  In  spite  of  much  effort  expended  in  debate, 
and  in  certain  quarters  with  no  little  emotion,  there  is  still  much 
which  needs  clarification.  The  solution  of  problems  of  testing  and 
evaluation  in  music  and  to  interpretation  of  test  results  is  therefore 
urgently  needed. 

No  little  stimulus  and  enlightenment  for  the  present  study  have 
been  occasioned  by  recent  advances  in  psychological  thinking  re- 
garding the  nature  of  musical  thought  processes.  These  develop- 
ments, together  with  an  awareness  of  the  progress  so  far  made  in 
the  field  of  music  testing,  have  helped  in  establishing  a  frame  of 
reference  for  this  research.  This  frame  of  reference,  moreover,  has 
been  explicit  enough  to  provide  orientation  and  direction  for  suc- 
cessive problems  as  they  arose  throughout  the  course  of  the  study. 

The  Area  Studied 

This  entire  research  has  been  concerned  with  the  general  area  of 
tonal  relationships,  or  that  phase  of  music  having  to  do  with  the 
melodic  and  harmonic  structure  of  music  exclusive  of  rhythmic 
patterns.     Within   this   area   are   a  number   of   types   of   musical 
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responses  or  functions.  MurselP  has  listed  functions  regarded  by- 
psychologists  and  research  workers  as  significant  indices  of  musical- 
ity,  many  of  which  deal  with  aspects  of  tonal  relationships.  The 
ability  to  differentiate  between  musical  intervals  has  been  selected 
from  this  list  for  exploration  in  this  study.  This  ability  was  chosen 
with  due  regard  to  its  significance  in  the  hierarchy  of  responses 
called  into  play  in  practical  musical  experience.  The  philosophy 
underlying  this  investigation  has  been  to  regard  the  ability  of  in- 
terval discrimination  as  of  great  functional  importance  in  the  per- 
ception of  tonal  relationships,  if  not,  indeed,  basic  to  such  perception. 
A  test  of  the  ability,  therefore,  was  believed  to  hold  promise  of 
diagnostic  and  prognostic  value  in  music  instruction. 

The  significance  of  the  function  of  interval  discrimination  is  dis- 
cussed later  in  this  chapter.  Since  the  investigation  has  been  con- 
cerned with  a  test  of  musical  aptitude,  there  is  need  at  this  point  to 
review  some  concepts  of  the  nature  of  musicality  and,  thence,  to 
establish  a  point  of  view  on  the  nature  of  musical  ability.  A  review 
of  some  general  and  specific  concepts  of  musical  ability  might  ap- 
propriately start  with  the  common  and  unsophisticated  concept 
of  the  layman. 

Concepts  Regarding  the  Nature  of  Musicality 
The  Layman's  Concept 

An  ear  for  music ' '  quite  often  sums  up  the  layman 's  concept  of 
natural  or  spontaneous  musical  talent.  The  expression  implies 
something  more  than  the  mere  use  of  sound  and  the  activity  of  the 
hearing  mechanism  as  such.  What  seems  to  be  implied  in  this 
expression  is  a  natural  grasp  and  understanding  of  the  patterns  of 
music  as  expressed  through  sound.  ' '  An  ear  for  music ' '  in  common 
parlance  implies  that  its  possessor  is  able  to  respond  intelligently 
and  meaningfully  to  the  thoughts  and  ideas  expressed  in  music. 
Where  the  urge  for  performance  predominates,  one  possessing  this 
talent  is  able  through  sheer  resourcefulness  and  inventiveness  to 
wheedle  music  out  of  anything  capable  of  producing  agreeable 
sound,  whether  it  be  by  means  of  his  own  vocal  apparatus  or  other 
media  of  musical  expression.  Such  a  concept  of  musical  ability  is 
hardly  scientific  for  our  own  purposes,  but  such  homely  observa- 
tions may  aid  us  in  keeping  our  feet  on  the  ground  when  the  issues 
tend  to  become  more  obscure  and  complicated. 


1  Mursell,  James  L.  The  Construction  of  A  Test  of  Musical  Aptitude.  Un- 
published manuscript  presented  at  the  Little  Academy,  Columbia  University, 
Jan.  20,  1937. 
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The  Concept  of  the  Musician-Teacher 

The  teacher  of  music,  being  a  musician,  is  a  little  more  specific 
about  his  concept  of  musical  ability.  While  he  must  and  does  con- 
sider physique  and  muscular  coordination,  he,  too,  gives  every  evi- 
dence of  considering  a  musical  ear  as  the  focal  point  around  which 
all  activity  in  music  centers.  Judging  from  available  literature, 
the  musician  appears  to  use  no  single  term  for  the  essence  of 
musicianship.  The  answer  to  this  question  is  perhaps  better  found 
in  what  the  musician  strives  for  in  his  teaching  rather  than  in  what 
he  says.  It  is  true,  of  course,  that  the  musician  is  concerned  ulti- 
mately with  standards  of  accuracy  in  the  sensation  and  perception 
of  sound.  He  insists  on  proper  intonation  through  correct  pitch 
placement,  upon  artistic  rendition  of  dynamics  through  the  use  of 
appropriate  intensities  of  sound,  and  upon  pleasing  vocal  or  in- 
strumental quality  through  controlled  production  of  timbre.  All 
of  these  aspects  of  sound  production  belong,  indeed,  in  the  realm  of 
physics,  but  for  the  musician  their  control  emanates  from  an  edu- 
cated and  stimulated  musical  imagination  built  up  through  a  study 
and  appreciation  of  a  wide  range  of  tonal,  rhythmic,  and  dynamic 
relationships.  Also,  in  effect,  the  musician-teacher  implies  that  what 
must  successfully  be  interpreted  or  performed  in  the  realm  of  music 
must  first  be  heard  mentally  and  in  the  imagination. 

The  practical  value  of  such  mental  activity  and  tonal  imagery 
in  the  equipment  of  the  prospective  music  student  may  be  appreci- 
ated by  considering  the  emphasis  throughout  all  music  education 
upon  work  in  music  theory.  Specific  courses  in  this  category  are 
variously  designated  as  solfeggio,  ear  training,  dictation,  harmony, 
and  composition,  and  appear  in  the  curriculums  of  all  outstanding 
schools  of  music.  One  of  the  underlying  purposes  of  such  study  is 
to  increase  facility  and  appreciation  for  the  essentials  of  musical 
structure  or  form.  The  educational  philosophy  back  of  such  study 
is  that  a  person  becomes  a  more  intelligent  musician  through  this 
mental  development.  It  is  significant  that  current  practice  in  the 
teaching  of  theory  encourages  the  student  to  hear  music  in  his  mind 
preparatory  to  reducing  it  to  written  notation.  Here  is  seen  an- 
other indication  of  the  mental  basis  of  musical  understanding  and 
ability. 

In  the  field  of  musical  literature  many  instruction  books  now 
emphasize  the  use  of  melody  including  some  examples  of  popular 
or  folk  music,  in  contrast  to  the  former  use  of  abstract  technical 
material.     In  methods  of  this  kind  the  approach  to  technical  prob- 
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lems  is  made  through  the  use  of  either  familiar  melodic  material  or 
through  material  intrinsically  melodic  in  nature.  The  use  of  melody, 
with  its  easier  grasp  of  musical  pattern,  aids  the  student  to 
form  mental  images  of  what  he  is  to  play  before  he  plays  it,  thus 
facilitating  technical  problems  associated  with  performance.  This 
use  of  melody  points  to  an  emphasis  in  another  direction  upon  the 
need  for  and  the  importance  of  mental  imagery  of  musical  patterns. 

Typical  of  this  methodology  is  Samuel  Gardner's  Violin  Method 
entitled  Harmonic  Thinking.^  The  aim  of  these  studies,  according 
to  the  author,  is  to  develop  a  sense  of  relative  intonation  through  the 
process  of  harmonic  thinking.  Gardner  contends  that  there  are  no 
unrelated  tones  in  music.  In  another  publication  he  scores  the 
practice  of  attempting  to  remedy  faulty  intonation  on  the  violin 
through  exhorting  the  student  to  play  higher  or  lower  in  reference 
to  any  given  external  standard.^  In  his  experience,  Gardner  states, 
the  ear  has  shown  itself  to  be  the  truest  guide  to  intonation  in  music, 
provided,  he  adds,  that  the  ear  has  a  clear  musical  conception  in  the 
first  place. 

The  difference  in  principle  between  the  concept  referred  to  by 
Gardner  as  harmonic  thinking  and  the  older  concept  of  developing 
correct  pitch  through  comparison  with  external  absolute  values  is 
a  vital  one  for  the  present  study.  On  the  one  hand  we  are  con- 
fronted with  a  need  for  a  mental  synthesis  of  musical  relationships 
or  values.  On  the  other  hand  we  find  reliance  on  or  conformity  to  a 
set  of  absolute  extrinsic  values.  Psychologically  the  former  involves 
mental  activity  or,  stated  another  way,  a  species  of  musical  intelli- 
gence or  thought.  Conformance  to  an  external  standard  calls  for 
little  in  the  way  of  mental  effort  of  a  musical  nature,  but  consists 
chiefly  in  the  relatively  simple  aural  attention  needed  for  making 
physical  or  acoustical  comparisons  with  a  given  frequency  standard. 
The  frequency  standard  in  this  case  may  be  given  by  the  instructor 
on  the  piano  or  any  other  instrument  at  hand.  At  other  times, 
through  good  fortune,  the  student  himself  may  possess  this  standard 
through  a  sense  of  absolute  pitch.  While  great  musicians  have  been 
known  to  possess  a  sense  of  absolute  pitch  there  is,  nevertheless,  a 
feeling  on  the  part  of  many  music  educators  that  far  too  much  at- 
tention has  been  placed  upon  certain  isolated  functions,  as  for  ex- 


2  Gardner,  Samuel.  ScJiool  for  Violin  Study  Based  on  Harmonic  Thinhing. 
Carl  Fischer,  Inc.,  New  York,  1939. 

3  Gardner,  Samuel.  ' '  Violin  Playing  and  Teaching  Based  on  the  Principles 
of  Harmonic  Thinking."  Music  Teachers  National  Association  Series,  34,  1939, 
pp. 347-353. 
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ample  this  sense  of  absolute  pitch.  In  this  one  instance  we  find, 
as  exemplified  in  Gardner's  teaching,  that  instead  of  emphasis  upon 
pitch  placement  there  is  a  stress  on  pitch  or  tonal  relatedness,  with- 
out which  no  real  accuracy  of  intonation  may  take  place.* 

In  the  field  of  music  instruction,  therefore,  it  appears  significant 
that  the  common  concept  of  "playing  by  ear,"  once  generally 
looked  down  upon  as  unbecoming  a  serious  musician,  now  comes  into 
its  own  as  a  psychological  device  for  musical  learning.  Further- 
more, emphasis  on  feeling  for  tonal  relatedness,  judging  by  the  im- 
portance placed  upon  it  in  theory  courses  and  in  instrumental 
instruction,  would  appear  to  be  one  of  the  fundamental  bases  for 
musical  talent  and  behavior.  Consequently  there  appears  to  be  no 
ready  term  for  the  essence  of  musicality  from  the  musician's  point 
of  view  unless  we  use  the  term  tonal  imagery  and  forthwith  proceed 
to  define  it  by  referring  to  specific  activities  carried  on  in  music  and 
music  education. 

The  Concept  of  the  Psychologists 

Our  previous  discussion  may  have  suggested  that  concepts  of 
laymen  and  of  musician-teachers  represented  within  their  respective 
groups  fairly  unanimous  points  of  view.  How  true  this  is  may  be 
open  to  question.  Nevertheless  there  are  ample  grounds  for  be- 
lieving that  these  groups  are  far  more  homogeneous  in  outlook  than 
are  the  psychologists  and  test  specialists  on  the  nature  of  musicality. 
Space  forbids  the  presentation  of  any  more  than  a  bare  outline  of 
major  issues.  Investigation  by  the  latter  groups  has  been  con- 
cerned with  two  general  approaches  to  the  study  of  musicality,  the 
physiological  and  the  psychological,  although  many  studies  embrace 
both.  The  earl}'-  investigations  of  Galton,  Helmholtz,  and  others 
were  concerned  almost  exclusively  with  the  sensations  and  percep- 
tions of  sound  in  relation  to  the  physical  stimulus.  A  classic  ex- 
ample of  this  is  Helmholtz 's  experiments  with  finely  differentiated 
tuning  forks  for  the  purpose  of  measuring  acuity  of  sensory  dis- 
crimination of  sound.  Seashore  used  similar  laboratory  methods 
for  testing  pitch,  and  devised  additional  tests,  the  majority  of  which 
measured  sensitivity  to  other  differences  of  the  sound  wave.  Sea- 
shore constructed  six  separate  tests  for  his  original  test  battery.^ 
By  means  of  phonograph  recordings  it  was  possible  to  utilize  these 


4  A  fuller  understanding  of  this  topic  may  be  obtained  from  any  standard 
work  on  the  physics  of  sound  under  the  topic  of  the  just  and  the  tempered  scale. 

5  Seashore,  Carl  E.    Seashore  Measures  of  Musical  Talent.    Columbia  Phono- 
graph Company,  New  York,  1919.     (Later  C.  H,  Stoelting  Co.,  Chicago,  111.) 
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laboratory  tests  iu  any  place  where  a  phonograpli  was  available. 
The  Kwalwasser-Dykema  test  battery^  utilizes  a  number  of  possi- 
bilities for  testing  sensitivity  to  variations  of  the  sound  wave,  and, 
in  addition,  includes  some  types  of  response  more  closely  associated 
with  musical  structure  and  organization. 

Although  it  is  true  that  Seashore  recognizes  important  areas  of 
ability  not  included  in  his  test  battery,  the  major  assumption  of  his 
Measures  of  Musical  Talents,  and  to  a  somewhat  lesser  extent  the 
Kwalwasser-Dykema  test,  is  that  sensory  discrimination  of  the  vari- 
ants of  the  sound  wave  is  the  basis  on  which  acquired  musical 
ability  may  be  developed.  Seashore's  first  battery  consisted  of  tests 
of  pitch,  intensity,  time,  tonal  memory,  rhythm,  and  consonance,  of 
which  the  first  four  involve  capacities  relating  to  sensory  discrimina- 
tion of  sound  waves. ^  The  last  two  tests  mentioned,  rhythm  and 
consonance,  deal  with  configurations  or  relationships  within  the  area 
of  rhythm  and  intervalic  quality,  respectively.  The  consonance  test 
was  distinctly  related  to  certain  aspects  of  musical  structure,  but 
the  manner  of  response  on  the  test,  in  terms  of  consonance  and  dis- 
sonance values,  has  been  held  by  many  who  have  taken  the  test  to  be 
too  "subjective."  This  subjectivity  possibly  accounts  for  its  elimi- 
nation from  the  1939  revision*  of  the  test.  The  test  of  timbre  which 
now  appears  in  the  revised  test  battery  is  another  measure  involving 
simple  sense  discrimination  of  the  sound  wave,  the  test  consisting  of 
paired  comparisons  of  different  tone  qualities  containing  measured 
differences  in  overtone  combinations  in  each  of  the  tones  presented. 
Thus  five  out  of  the  six  tests  of  the  revised  Seashore  battery  now 
measure  capacity  to  differentiate  between  physical  differences  of  the 
sound  wave  and  bear  little  psychological  relationship  to  patterns  of 
musical  structure. 

There  would  appear  to  be  no  extensive  body  of  experimental 
work  by  any  one  experimenter  or  group  of  research  w^orkers  in  the 


6  Kwalwasser,  Jacob,  and  Dykema,  Peter  W.  Kwalwasser-DyTcema  Music 
Tests.     Carl  Fischer,  Inc.,  New  York,  1930. 

"<  An  exception  to  this  statement  may  possibly  be  made  in  the  ease  of  the 
test  of  tonal  memory,  depending  upon  the  manner  of  response  utilized  in  taking 
the  test.  The  test  measures  memory  for  single  notes  played  in  short  series. 
Each  series  of  notes  is  played  twice  and  in  the  second  series  one  note  is  changed. 
If  these  note  changes  are  perceived  on  the  basis  of  memory  for  notes  perceived 
singly  in  the  series,  nothing  but  a  sense  image  of  a  single  note  has  been  involved. 
If  the  series  is  heard  as  a  melodic  pattern,  a  difficult  thing  in  view  of  the  lack 
of  simple  tonal  relationships  between  notes  of  the  series,  the  recognition  of  note 
change  takes  place  through  a  function  more  nearly  associated  with  musical 
structure. 

8  Seashore,  Carl  E.,  Lewis,  Don,  and  Saetveit,  Joseph  G.  The  Seashore 
Measures  of  Musical  Talents.  Educational  Department,  E.  C.  A.  Mfg.  Co.,  Inc., 
Camden,  N.  J.,  1939. 
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study  of  the  purely  psychological  aspects  of  musical  perception. 
However,  a  vast  amount  of  scattered  but  significant  research  has 
been  gathered  and  presented  by  MurselP  who  has  synthesized  and 
interpreted  these  data  into  a  distinct  concept  of  the  psychology  of 
music.     Mursell  states  his  position  as  follows : 

This  crucial  fact,  that  we  hear  mentally  created  patterns  rather  than  im- 
posed sensations — that  the  mind  selects  and  organizes  and  gives  shape  to  what 
we  hear — is  the  foundation  of  all  musical  organization  and  the  secret  of  the 
expressive  possibilities  of  music.io 

Mursell,  therefore,  sees  a  gap  between  responses  involving  simple 
sensation  and  those  which  perceive  musical  relationships.  This  gap, 
he  points  out,  can  be  accounted  for  only  by  mental  activity  which 
functions  by  means  of  selection  and  synthesis  of  sound  stimuli  in 
the  conception  of  meaningful  musical  configurations.  Throughout 
his  whole  work  Mursell  sees  great  danger  in  confusing  the  laws  of 
the  physics  of  sound  with  the  psychological  laws  governing  the  per- 
ception of  musical  values  expressed  through  the  medium  of  sound 
stimuli. 

There  are  some  tests  which  reflect  the  point  of  view  that  valid 
measurement  in  music  should  sample  functions  of  behavior  observed 
in  a  musical  setting.  Prominent  in  this  category  of  tests  is  the 
Drake  Test  of  Musical  Memory,^^  Drake's  assumption  being  that 
musical  memory  is  a  basic  factor  of  musical  perception.  Drake  tests 
memory  for  two-measure  melodies  which  the  test  administrator 
plays  on  the  piano.  This  test  and  others  of  a  similar  nature  attempt 
to  measure  responses  to  musical  patterns.  A  more  recent  test  is  the 
Knuth  Achievement  Test^^  which,  according  to  its  author,  measures 
comprehension  and  recognition  of  music  from  its  notation. 

In  psychology  and  in  music  testing,  therefore,  two  schools  of 
thought  appear.  While  their  functions  necessarily  overlap  to  some 
extent,  emphasis  upon  indices  of  musicality  appears  fairly  distinct. 
The  measures  of  musical  talent  selected  by  Seashore  for  his  tests 
reflect  an  emphasis  upon  sensory  capacity  for  the  differentiation  of 
sound  stimuli.  Seashore's  assumption,  by  virtue  of  the  selection  of 
these  indices,  appears  to  be  that  persons  possessing  high  degrees  of 
sensory  keenness  are  to  that  extent  capable  of  corresponding  ad- 


9  Mursell,  James  L.     The  Psychology  of  Music.     W.  W.  Norton  and  Co., 
Inc.,  New  York,  1937. 

10  Ibid.,  p.  50. 

11  Drake,  Ealeigh  M.     Brake  Musical  Memory  Test.     Public  School  Publish- 
ing Co.,  Bloomington,  111.,  1934. 

12  Knuth,  William   E.     Knuth  Achievement   Test  In  Music.     Educational 
Test  Bureau,  Inc.,  Minneapolis  and  Philadelphia,  1936. 
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vancement  in  musical  development.  The  implications  of  tests  simi- 
lar to  the  Drake  Memory  Test  are  that  indices  of  musical  aptitude 
are  best  determined  by  measuring  responses  observed  while  func- 
tioning in  a  musical  setting  or  situation.  There  is  no  disposition  at 
this  point  to  deny  the  correlation  of  observed  musical  ability  with 
measures  of  either  type  of  response.  A  clear  differentiation  exists, 
however,  between  testing  methods  based  on  one  type  of  response  or 
the  other. 

Summary  of  Concepts  on  the  Nature  of  Musicality 

From  the  point  of  view  of  the  layman  and  the  musician,  musi- 
cality appears  to  be  an  ability  emanating  from  a  fertile  and  stimu- 
lated imagination.  To  the  musician,  especially,  it  is  an  affair  of  the 
mind  particularly  active  in  musical  imagery.  A  certain  school  of 
psychological  thought  adheres  to  the  assumption  that,  by  virtue  of 
the  medium  of  musical  expression,  musical  talent  is  controlled  and 
made  possible  through  sensitivity  to  physical  differences  of  the 
sound  wave.  Another  school  of  psychological  thought  regards  the 
true  basis  of  musical  talent  as  being  the  power  of  mentaf  synthesis 
of  the  materials  and  structure  of  music  as  expressed  in  this  medium 
of  sound.  This  school  does  not  discount  the  importance  and  need 
for  sensory  keenness.  Persons  deficient  in  such  capacities  for 
physiological  or  pathological  reasons  would  obviously  be  handi- 
capped in  using  the  medium  of  sound  for  purposes  of  musical  ex- 
perience or  enjoyment.  However,  the  positive  possession  of  sensory 
powers,  according  to  this  latter  school,  does  not  necessarily  imply 
that  future  musical  development  is  assured,  although  some  correla- 
tion of  the  two  is  necessarily  present. 

Point  of  View 

Enough  has  been  presented  of  the  psychological  and  research 
background  for  this  study  to  make  intelligible  the  point  of  view 
which  has  obtained  throughout  its  entire  course.  Consistent  with 
this  point  of  view  will  be  a  position  on  the  basic  nature  of  musical 
experience  together  with  some  assumptions  which  have  special  bear- 
ing upon  the  specific  problems  of  this  study. 

The  position  of  the  writer  from  the  very  start  has  been  that  valid 
musical  experience  takes  place  through  definite  thought  processes 
involving  tonal-rhythmic  patterns  and  has  its  fruition  in  the  func- 
tioning of  a  musical  intelligence.  Due  recognition,  of  course,  is 
accorded  to  the  various  sensory  factors  such  as  tone  quality,  instru- 
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mental  color,  and  the  infinite  possibilities  for  blending  and  contrast- 
ing various  timbres  of  instruments.  The  position  is  held,  however, 
that  experience  of  a  strictly  musical  nature  takes  place  only  through 
the  organization  into  meaningful  patterns  of  the  various  tonal  and 
rhythmic  relationships  of  musical  composition. 

Such  a  position  is  parallel  to  the  concept  generally  held  in  psy- 
chology and  educational  philosophy  that  everything  which  the  mind 
assimilates  as  experience  takes  place  through  a  process  of  mental 
synthesis  of  sensory  perceptions.  There  appears  to  be  no  valid 
reason  why  musical  experience  should  not  function  in  the  same 
manner  as  all  other  types  of  human  experience.  The  point  of  view 
of  this  study,  therefore,  is  in  accordance  with  Mursell's  contention 
that  in  music  we  hear  mentally  created  patterns  and  not  imposed 
sensations. 

Other  Assumptions 

An  obvious  corollary  of  the  foregoing  position  is  that  measures 
of  sensory  keenness  offer  only  a  partial  account  of  either  potential 
or  demonstrated  ability  in  music.  Research  has  yet  to  tell  us  how 
much  sensory  equipment  on  the  one  hand,  and  how  much  psychologi- 
cal equipment  for  musical  values  on  the  other  hand,  are  needed  for 
successful  pursuit  of  music  study.  Validity  coefficients  of  tests  of 
sensory  capacity,  such  as  the  Seashore  test  battery,  are  not  high 
enough  for  us  to  pin  our  faith  entirely  on  this  type  of  testing.^^-  ^* 
A  point  of  view  underlying  the  present  research  study,  conse- 
quently, is  that  success  in  testing  musical  aptitude  might  be  much 
better  assured  when  truly  psychological  functions  employed  in  the 
understanding  of  musical  values  are  selected  for  use  as  responses  in 
formal  testing. 

Our  stated  position  regarding  the  nature  of  musical  intelligence 
permits  of  making  certain  additional  assumptions.  First,  we  may 
assume  that  in  testing  any  given  function  of  a  musical  nature  we 
shall  obtain  a  response  which  is  the  product  of  the  interaction  of  the 
meaningfulness  and  extent  of  past  experience  in  music  with  a  natu- 
ral, innate  aptitude  or  predilection  for  such  specific  ability.  Second, 
we  may  have  reasonable  confidence  that  responses  involving  the 
discrimination  of  musical  intervals  will  come  within  the  scope  of  the 
foregoing  assumption.     Third,  in  any  interpretation  of  test  results 


13  Parnsworth,  Paul  E.  "An  Historical,  Critical  and  Experimental  Study 
of  the  Seashore-Kwalwasser  Test  Battery."  Genetic  Psychology  Monographs, 
Vol.  9,  No.  5,  May,  1931. 

14  Mursell,  op.  cit.,  p.  296. 
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there  will  be  no  basis  for  separate  consideration  of  either  innate  or 
acquired  ability. 

There  would  appear  to  be  but  one  choice  in  the  definition  of  the 
functions  measured,  and  that  is  to  state  test  results  in  terms  of  apti- 
tude. Aptitude  is  defined  in  Warren's  Dictionary  as  "a  condition 
or  set  of  characteristics  regarded  as  symptomatic  of  an  individual's 
ability  to  acquire  with  training  some  knowledge,  skill,  or  set  of 
responses  such  as  the  ability  to  speak  a  language,  to  produce  music, 
etc. ' '  Bingham  holds  that  aptitudes  are  fairly  stable  and  that  vari- 
ations will  occur  "within  limits  which  can  often  be  ascertained  in 
advance.  "^^  For  the  most  part  the  present  study  has  been  pursued 
with  the  belief  that  some  such  stability  exists.  There  was  an  attempt 
in  the  beginning,  however,  to  explore  the  limits  of  variation  of 
Bingham's  definition  by  constructing  and  administering  a  short 
learning  situation,  but  this  phase  of  research  was  dropped  in  favor 
of  a  single  measure  of  ability. 

Influence  op  the  Point  of  View  on  the  Course  of  Study 

The  position  taken  had  a  definite  influence  on  steps  followed 
throughout  the  study.  All  responses  had  to  be  musically  significant 
to  be  accepted  at  all.  In  other  words,  each  response  selected  for 
study  must  embody  some  mental  process  employed  in  the  discern- 
ment of  musical  form  or  structure.  Furthermore,  to  avoid  having 
test  results  influenced  by  specific  aspects  of  musical  training  or 
experience,  care  was  taken  to  obtain  responses  as  little  dependent  on 
particular  training  or  experience  as  possible.  Since  the  purpose  of 
the  study  called  for  the  examination  of  the  function  of  interval  dis- 
crimination as  an  index  of  ability  in  more  complex  areas,  there  was 
no  need  to  present  a  situation  of  recognizable  melodic  or  composi- 
tional form.  The  only  requirement  was  that  responses  should  repre- 
sent perceptual  processes  employed  in  the  tonal  aspects  of  musical 
experience.  It  was  also  very  desirable  that  responses  be  sufficiently 
isolated  by  proper  test  techniques  in  order  that  the  test  as  finally 
developed  might  be  indicative  of  specific  types  of  musical  behavior. 
By  carrying  out  this  latter  objective  the  diagnostic  or  prognostic 
value  of  the  test  in  educational  practice  could  more  easily  be  assured. 

It  has  already  been  noted  in  the  introduction  that  there  was 
nothing  at  the  outset  of  the  study  which  suggested  a  clear-cut  experi- 
mental design  or  technique.    Throughout  the  exploratory  phase  of 


15  Bingham,  Walter  Van  Dyke.     Aptitudes  and  Aptitude  Testing,  p.   33. 
Harper  and  Brothers,  New  York,  1937. 
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the  study  each  succeeding  step  was  formulated  and  determined 
through  the  application  of  the  general  point  of  view  on  the  nature 
of  musicality  to  leads  or  possibilities  as  they  presented  themselves. 
While  a  certain  weakness  might  be  attributed  to  such  procedure, 
it  seems  to  be  in  part  justified  by  the  results  achieved. 

Types  and  Sources  of  Data 

Both  group  and  individual  tests  of  a  preliminary  nature  were 
developed  and  used.  These  tests  supplied  data  on  the  range  and 
extent  of  ability  tested.  They  were  also  used  to  obtain  tentative 
validity  coefficients  with  various  indices  of  musical  ability  available 
in  schools  where  experimentation  was  carried  on.  Individual  tests, 
on  the  other  hand,  were  used  to  study  the  nature  and  extent  of 
individual  response.  The  introspection  of  persons  tested  in  this 
manner  also  yielded  valuable  clues  to  musical  thought  processes  and 
incidental  personal  reactions  by  those  taking  the  tests.  Thus,  each 
succeeding  test,  whether  group  or  individual,  was  constructed  from 
the  suggested  clues  and  quantitative  data  derived  from  the  previous 
test  efforts,  data  consisting  of  subjective  as  well  as  objective  infor- 
mation. Data  from  three  experimental  tests  incorporating  an  ex- 
perimental design  served  as  material  for  the  construction  of  a  final 
test. 

In  the  administration  of  the  final  test,  scores  were  obtained  on 
samples  of  population  extending  from  the  sixth  grade  in  elementary 
schools  to  selected  music  groups  on  the  conservatory  and  graduate 
school  level.  Criteria  for  the  validation  of  the  test  in  the  form  of 
teacher  estimates  and  class  marks  were  obtained  from  various  classes 
in  musical  theory  and  music  appreciation.  Scores  from  various 
music  groups  and  certain  other  population  groups  were  studied  to 
determine  the  existence  of  significant  differences  between  them. 
Scores  on  certain  existing  tests  of  music  were  also  obtained  in  order 
to  determine  the  relationship  of  the  function  measured  by  the  test 
with  these  other  measures  of  musical  ability. 

The  Significance  of  the  Function  of  Interval 
Discrimination 

We  may  now  discuss  with  some  degree  of  understanding  the  sig- 
nificance of  the  choice  of  interval  discrimination  as  a  basic  element 
in  musical  thought.  We  have  previouslj^  noted  that  only  responses 
associated  with  active  experience  in  the  form  and  structure  of  music 
would  be  considered  in  the  study.    A  brief  examination  of  certain 
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psychological  aspects  of  the  musical  interval  will  help  to  justify  the 
selection  of  this  function,  and  will  help  to  show  its  close  relationship 
to  important  functions  involved  in  musical  perception. 

A  Psychological  Function 

"Webster's  New  International  Dictionary  defines  the  musical 
interval  as  ''the  relation  of  two  tones  with  regard  to  pitch,  espe- 
cially as  represented  in  their  notation."  In  the  perception  of  the 
musical  interval  itself,  we  find  a  psychological  function  of  musical 
thought  of  the  most  elementary  and  basic  sort.  Each  musical  inter- 
val, made  up  as  it  is  of  two  individual  tones,  carries  with  it  a  con- 
figuration of  tonal  relationship  or  quality  readily  perceivable  by 
musical  persons.  This  configuration  is  more  than  the  sum  total  of 
two  individual  tones;  it  is  a  tonal  relationship  perceived  only  by 
virtue  of  mental  synthesis.  It  is  quite  conceivable  that  persons 
deficient  in  this  function  are  unable  to  derive  any  definite  meaning 
or  significance  from  hearing  two-toned  combinations  or  intervals. 
It  is  even  possible,  since  research  has  not  convinced  us  otherwise, 
for  persons  with  keen  discrimination  for  differences  of  sound  waves 
to  be  deficient  in  the  ability  to  organize  musical  relationships  be- 
tween two  or  more  tones.  Hence,  if  we  are  to  search  for  indices  of 
musical  ability  we  do  well  to  pass  by  for  the  moment  elements  of 
hearing  associated  with  sensory  acuity  and  to  test  functions  involv- 
ing feeling  for  musical  relationships. 

In  a  sense,  intervalic  quality  lies  at  the  very  root  of  our  whole 
system  of  tonality,  or  of  atonality  for  that  matter.  Heard  in  se- 
quence, the  two  tones  of  a  musical  interval  form  the  tonal  basis  of 
larger  patterns  of  melodic  contour,  a  single  interval  progression 
constituting  psychologically  the  smallest  unit  of  melodic  thought  or 
meaning.  Total  melody  can  hardly  be  grasped  without  proper 
apprehension  of  the  individual  intervals  which  go  to  make  up  the 
larger  whole.  This  statement  is  at  once  apparent  if  we  take  as  an 
example  the  singing  by  a  student  of  an  exercise  in  a  solfeggio  class. 
Singing  the  exercise,  let  us  suppose  he  misjudges  one  single  interval 
by  singing  a  wrong  note.  He  thus  sings  quite  another  melodj^ 
through  this  one  alteration.  His  mistake  may  arise  from  misjudg- 
ment  of  either  intervalic  quality  or  the  pitch  distance  of  the  interval 
of  which  the  note  is  a  part,  or  both.  Through  failure  to  grasp  the 
tonal  significance  of  this  one  single  interval  his  psychological  and 
musical  perception  of  the  original  total  melodic  pattern  becomes 
invalid. 
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Furthermore,  when  heard  simultaneously  the  two  notes  of  the 
musical  interval  also  constitute  the  simplest  unit  of  chord  construc- 
tion. The  musical  interval  is,  theoretically,  a  bi-chord  and  many- 
theorists  regard  it  so,  for  intervals  possess  intrinsic  harmonic  quali- 
ties just  as  do  triads  and  larger  groupings  of  tones. 

The  two-tone  relationship  known  as  the  interval  thus  becomes  the 
basic  perceptual  unit  of  those  larger  patterns  of  musical  thought  we 
recognize  as  melody  and  harmony.  It  is  quite  understandable, 
therefore,  that  in  the  teaching  of  theory,  with  its  emphasis  upon 
various  aspects  of  melody,  harmony,  counterpoint,  and  musical 
composition,  the  ability  to  think  in  terms  of  intervals  should  be  basic 
to  such  study.  Properly  learned,  recognition  and  appreciation  of 
the  various  properties  and  possibilities  of  musical  intervals  form  the 
basis  for  intelligible  experience  and  creative  effort  in  the  world  of 
tonal  relationships. 

Types  of  Intervalic  Perception 

Feeling  for  interval  quality  has  been  variously  described.  Ort- 
mann^*^  lists  two  types  of  response — perception  of  pitch  distance, 
and  fusion  or  blending.  Valentine,^^  using  intervals  for  studying 
certain  differences  between  musical  and  nonmusical  groups,  directed 
subjective  responses  to  the  consonance  aspects  of  various  two-toned 
combinations.  Seashore  in  his  first  test  battery  used  intervals  in  his 
consonance  test,  and  directed  test  response  to  degrees  of  consonance 
and  dissonance.  However,  a  review  of  the  research  work  in  which 
musical  intervals  are  used  as  a  basis  for  testing  indicates  that  results 
have  been  too  unreliable  for  use  in  formal  testing.  Valentine's 
results,  while  showing  certain  trends,  gave  no  promise  of  reliability 
for  test  purposes,  and  the  low  reliability  and  validity  coefficients^^ 
of  the  Seashore  Consonance  Test  were  probably  the  reason  for  its 
elimination  in  the  1939  revision  of  that  test  battery. 

The  Significance  of  the  Criteria  Used  for  Validation 
OF  Test  Measures 

It  has  already  been  pointed  out  that  the  study  of  musical  theory 
in  colleges  and  conservatories  occupies  an  important  place  in  the 
curriculum  and  contributes  to  important  aims  in  musical  education. 


16  Ortmann,  Otto.  Research  Studies  in  Music,  No.  2,  October,  1934,  pp.  67, 
68.     Peabody  Conservatory  of  Music,  Baltimore,  Md. 

17  Valentine,  C.  "W.  "The  Aesthetic  Appreciation  of  Musical  Intervals 
Among  School  Children  and  Adults."  British  Journal  of  Psychology,  No.  6, 
1913,  pp. 190-216. 

18  Mursell,  op.  cit.,  pp.  292,  296. 
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The  details  of  a  report  by  Larson^^  show  that  the  Theory  I  course 
at  the  Eastman  School  of  Music  is  basic  to  the  entire  music  curricu- 
lum. For  this  and  other  reasons  criteria  obtained  from  classroom 
work  in  these  subjects  possess  high  merit  for  use  in  validation  pro- 
cedures. Because  of  the  intensive  and  directed  nature  of  such 
study,  teacher  estimates  and  course  grades  are  based  on  a  compara- 
tively large  number  of  separate  observations,  many  more,  in  fact, 
than  would  be  possible  to  secure  in  the  average  controlled  experi- 
ment. These  estimates,  moreover,  are  based  on  a  period  of  observa- 
tion at  least  one  semester  in  length.  While  they  still  may  be  re- 
garded as  being  subjective,  they  do  offer  the  most  reliable  and  valid 
measures  of  student  ability  possible  in  this  particular  area.  Further 
discussion  on  the  significance  of  criteria  is  presented  in  Chapter  IV. 

The  Significance  of  the  Capacity  for  Tonal  Imagery 

Perception  of  the  various  properties  of  musical  intervals  is 
linked  with  the  whole  area  of  tonal  imagery.  Seashore  himself  is 
quite  explicit  on  the  significance  of  tonal  imagery  as  an  essential 
part  of  musical  talent.  His  statement  is  so  positive  on  this  point 
that  it  will  be  worth  while  to  quote  it  fully.^" 

Commenting  on  the  possibilities  of  using  measures  of  tonal 
imagery  as  an  index  of  musical  ability,  and  of  the  need  for  objec- 
tivity in  its  measurement.  Seashore  has  this  to  say : 

If  it  were  adequately  measurable  and  I  were  limited  to  a  single  index  to 
musical  talent,  I  would  take  the  record  of  natural  capacity  for  tonal  imagery. 
On  account  of  the  demands  for  objectivity,  current  psychology  has  given  but 
slight  attention  to  this  exceedingly  important  factor. 

Continuing,  Seashore  points  out  the  difference  in  the  use  of  tonal 
imagery  in  the  experience  of  inferior  and  of  capable  musicians. 

An  inferior  musician  can  hear  and  perform  without  conscious  use  of  tonal 
imagery;  and  in  that  case  he  remembers,  images,  and  creates  music  in  terms  of 
names,  concepts  or  analogies  for  the  different  elements  of  a  tone.  A  real 
musician,  on  the  other  hand,  has  the  ability  to  reconstruct  the  tone  in  accurate 
detail  in  the  form  of  memory  images  and  can  imagine,  compose,  and  hold  up  for 
detailed  and  objective  scrutiny  the  tonal  situation  which  he  wishes  to  create. 
Between  these  two  extremes,  we  have  among  those  who  begin  training  for  music, 
a  normal  distribution  of  the  ability  to  retain,  relive,  and  create  music  without 
the  presence  of  the  physical  sound,  entirely  in  terms  of  the  mental  image. 

He  then  discusses  the  basis  for  such  talent  and  offers  suggestions 
for  its  improvement. 


19  See  pp.  65  f . 

20  Seashore,   Carl   E.     ' '  The   Psychology  of   Music, ' '  Article  No.   XVIII. 
Music  Educators  Journal,  Vol.  XXV,  No.  4,"  Feb.,  1939,  pp.  23-24. 
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First  is  the  fundamental  fact  that  the  musical  mind  is  born  with  this  talent 
and  comes  into  the  interests  and  activities  of  music  by  natural  selection,  whereas 
the  scientist  gravitates  toward  a  career  in  which  visual  experiences  are  more 
dominant.  It  ordinarily  means  also  that  the  musician,  living  persistently  in 
tonal  experiences,  cultivates  this  ability.  The  psychological  fact  remains,  how- 
ever, that  the  degree  of  possible  development  depends  upon  the  degree  of  the 
inherited  talent,  which  varies  very  greatly  among  normal  individuals.  To  good 
musicians  the  auditory  image  is  so  commonplace  and  conspicuous  that  they  take 
it  for  granted,  just  as  they  take  it  for  granted  that  they  can  see  red  and  taste 
sour  or  hear  the  tone  when  it  is  physically  present.  As  a  result  they  seldom  give 
the  pupil  systematic  training  in  the  critical  use  of  images. 

Seashore  now  resolves  tonal  imagery  into  the  same  elements 

which  constitute  the  basis  for  his  test  battery. 

Let  us  ask  again :  What  does  ability  in  tonal  imagery  mean  in  actual  music  ? 
In  the  first  place  the  image  has  the  same  four  elements  as  the  perception; 
namely,  pitch,  loudness,  duration,  and  timbre — or  in  their  complex  forms,  melody, 
harmony,  rhythm,  volume,  and  sonance  or  tone  quality.  Each  of  these  may  be 
inherited  and  developed  in  a  dominant  way  so  that  one  musician  lives  dominantly 
in  a  world  of  time  and  rhythm,  another  in  the  realm  of  dynamic  expression, 
another  in  terms  of  tone  quality. 

Seashore  then  states,  as  he  sees  it,  the  importance  of  tonal 
imagery  as  an  index  of  a  genuine  real  musical  experience. 

Second,  it  is  perfectly  clear  that  the  degree  to  which  a  person  can  accumu- 
late past  experiences  of  a  particular  tonal  characteristic  in  reproducible  images, 
is  an  index  to  the  degree  in  which  he  lives  his  musical  experience  realistically, 
can  scrutinize  his  present  performance  in  relation  to  these  experienced  goals,  can 
create  new  modes  of  expression  in  his  voice  or  instrument,  and  can  master  the 
tonal  structure  in  creative  music. 

Likewise,  musical  thinking  is  essentially  the  manipulation  of  images,  or 
pitch,  loudness,  time,  and  timbre  in  various  degrees  of  present  experience  of  these 
conceptions;  and,  most  important  of  all,  the  vividness  of  the  feeling  value  and 
emotional  quality  of  memory  and  imagery  of  music  is  contingent  upon  the  real- 
ism of  the  image  present. 

Third,  it  also  affects  the  hearing  of  tones.  Perception  of  tone  is  essentially 
an  act  of  reconstruction  in  terms  of  past  experiences;  and  if  these  come  only  in 
verbal  form,  they  will  be  correspondingly  empty  of  the  esthetic  discrimination. 

He  concludes  his  argument  by  pointing  out  the  educational  and 

guidance  values  of  measures  of  tonal  imagery  in  the  development  of 

genuine  musical  talent  in  the  public  schools. 

If  the  instructor  in  the  public  music  school  who  deals  with  young  aspirants 
has  a  clear  and  convincing  conception  of  the  role  of  tonal  imagery  and  can 
evaluate  it  to  some  degree,  he  cannot  only  see  an  explanation  of  a  large  part 
of  success  or  failure  and  likes  and  dislikes  of  music,  but  he  can  guide  the  student 
in  relation  to  outlets  in  the  direction  of  music. 

Taken  as  a  whole,  there  has  appeared  no  other  single  passage  in 
the  literature  of  the  psychology  of  music  which  seems  to  sum  up  so 
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clearly  the  arguments  which  point  to  the  function  of  tonal  imagery 
as  a  basic  factor  in  musical  ability.  He  points  out  the  lack  of  objec- 
tivity which  has  surrounded  the  subject.  He  places  due  importance 
on  the  power  of  imagery  of  tonal  relationships  without  the  use  of 
physical  sound.  He  feels  that  this  function  is  inherited,  a  state- 
ment not  out  of  harmony  with  the  philosophy  of  this  study,  and  he 
notes  that  such  ability  is  also  the  product  of  innate  capacity  and  the 
meaning-fulness  of  past  experience.  This  agrees  with  the  point  of 
view  stated  earlier  in  the  chapter. 

Seashore  also  suggests  in  the  latter  part  of  this  quotation  that 
esthetic  discrimination,  frequently  spoken  of  as  appreciation,  de- 
pends upon  the  richness  of  tonal  imagery.  It  seems  not  impossible 
that  much  music  appreciation  springs  from  an  awareness  of  tonal 
values,  and  that  if  adequate  measures  of  these  values  can  be  secured, 
there  is  reasonable  assurance  of  judging  the  extent  of  appreciation, 
at  least  that  aspect  of  appreciation  which  emanates  from  tonal 
imagery. 

-  There  is,  therefore,  most  hearty  agreement  between  the  point  of 
view  of  this  study  and  that  of  Seashore  on  the  extreme  importance 
of  tonal  imagery  in  the  equipment  of  the  real  musician.  For  reasons 
which  have  already  been  pointed  out,  however,  it  appears  difficult 
to  follow  through  with  him  when  he  breaks  up  tonal  imagery  into 
the  sense  elements  of  pitch,  loudness,  duration  and  timbre.  This  is 
an  attempt  to  bridge  the  gap  between  the  physical  and  the  mental, 
between  the  physical  stimuli  of  sound  and  the  psychological  percep- 
tion of  musical  structure  as  expressed  and  presented  in  sound,  with- 
out taking  into  proper  account  the  psychological  basis  of  the  original 
function  of  tonal  imagery. 

However,  at  this  point  there  is  no  need  to  engage  in  further 
dissent.  Seashore  has  pointed  to  the  need  for  evaluation  in  this 
important  area  of  tonal  imagery  and  it  is  well  to  see  it  justified  in 
so  complete  a  manner.  One  conviction  underlying  the  present  study 
is  that  the  function  of  interval  discrimination  is  so  closely  related  to 
tonal  imagery  that  a  test  of  intervalic  discrimination  may  well  serve 
as  an  index  of  this  important  area  of  tonal  imagery.  If  it  is  found 
related  to  any  appreciable  degree,  it  becomes  an  important  and  spe- 
cific index  of  musical  ability  with  many  valuable  uses. 

Actual  work  on  this  study  began  with  an  examination  of  a  pub- 
lished test  of  intervals  which  seemed  to  embody  the  only  significant 
approach  to  the  problem  appearing  in  research  literature.  The 
following  chapter  opens  with  an  account  of  experiments  using  this 
test  as  a  basis  for  later  effort. 


CHAPTEE  II 

PROGRESSIVE  STEPS  IN  EXPERIMENTATION 

The  material  in  the  present  chapter  has  been  assembled  for  two 
purposes.  First,  it  should  be  of  interest  and  significance  to  trace 
the  steps  and  to  observe  the  method  of  inquiry  used  in  setting  up 
the  various  test  situations  which  have  been  developed.  In  a 
problem  in  which  so  little  precedent  was  available  much  exploratory 
work  was  inevitable,  but  each  step  served  to  provide  clues  for  suc- 
ceeding steps.  Second,  many  of  the  test  situations  and  the  results 
which  they  yielded  offer  suggestions  to  the  serious  research  worker 
when  certain  other  objectives  of  study  are  set  up.  Only  those  data 
of  the  exploratory  work  which  contribute  most  directly  to  the  aims 
of  the  study  are  examined  here  in  detail.  Consequently  the  data 
in  some  instances  are  abbreviated.  Furthermore,  in  connection 
with  these  preliminary  data  no  tests  of  statistical  significance  were 
made  because  of  the  very  general  nature  of  the  attack  on  a  number 
of  separate  problems. 

Administration  of  Tests  Using  the  Technique  of  the 
ScHOEN  Test  of  Relative  Pitch 

The  Schoen  Test  of  Relative  Pitch^  appeared  to  be  the  one  test 
of  intervals  in  research  literature  which  indicated  any  promise  of 
usefulness  in  the  present  study.  In  order  to  obtain  reactions  and 
other  information  on  this  test  it  was  given  to  a  group  of  ninety-one 
students  in  a  college  psychology  class.  The  term  relative  pitch  had  a 
familiar  ring  to  it  and  seemed  connected  with  the  material  brought 
out  in  the  first  chapter  in  the  discussion  of  musical  relationships  of 
various  types.  This  test  of  relative  pitch  is  one  of  three  tests  of 
musical  aptitude  constructed  and  reported  by  Schoen.  It  measures 
a  function  w'hich  the  author  describes  as  relative  pitch  through  a 
presentation  of  paired  intervals  played  in  sequence,  that  is, 
melodically.  The  test  consists  of  one  hundred  items.  Each  item 
consists  of  two  different  intervals.  The  subject  is  asked  "to  judge 
the  difference  in  the  pitch  of  the  two  tones."  The  second  interval 
is  to  be  compared  with  the  first,  the  subject  to  determine  whether 
it  is  smaller  or  larger.  The  first  part  of  the  test  is  organized  into 
six  series,  each  containing  ten  items,  and  a  definite  plan  of  pitch 


1  Schoen,  Max.     "Tests  of  Musical  Feeling  and  Understanding."     Journal 
of  Comparative  Psychology,  Vol.  5,  1925,  pp.  31-52. 
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direction  assigned  to  each  series.  The  remaining  forty  items  con- 
tain a  selection  of  the  original  sixty  items  and  are  used  to  study 
reliability. 

It  was  considered  important  for  our  purpose  to  secure  observa- 
tions, comments,  and  certain  types  of  introspection  which  might 
throw  some  light  on  the  type  of  mental  reaction  developed  in  the 
taking  of  the  tests.  The  matter  of  errors  and  distribution  of  scores 
was  considered  secondary  at  this  point.  The  test  had  been  recorded, 
using  a  clarinet  for  the  playing  of  the  notes.  Conclusions  from  this 
testing,  together  with  clues  for  future  testing,  are  reported  at  the 
end  of  this  section. 

Adaptation  of  the  Schoen  Technique  to  the  Elementary 

School  Level 

At  the  same  time  that  experiments  were  carried  on  at  the  college 
level,  certain  adaptations  of  the  Schoen  technique  of  testing  inter- 
vals were  made  in  order  to  test  sixth-grade  children.  Instead  of  a 
wide  variety  of  paired  intervals,  a  selection  was  made  of  two  in- 
tervals, the  octave  and  the  perfect  fifth,  using  some  of  the  pitch 
directions  found  in  the  Schoen  test.  The  total  number  of  items  was 
reduced  to  thirty,  presented  in  three  groups  of  ten  each.  A  small 
portable  organ  was  used  to  provide  the  tones.  The  test  was  ad- 
ministered on  three  different  occasions  within  a  period  of  four  days 
in  order  to  determine  the  amount  of  improvement  which  might  be 
made  on  the  test.  Certain  drills  were  presented  before  and  after 
each  test,  using  the  two  intervals  as  drill  material.  Drill  consisted 
in  singing  and  listening  to  examples  of  the  two  intervals. 

Forty  sixth-grade  pupils  participated  in  the  tests.  The  means 
of  the  three  tests  were,  respectively,  11.7,  10.2,  and  8.7,  scores  being 
expressed  in  number  wrong  out  of  thirty  items.  These  decreasing 
means  appeared  to  indicate  that  the  class  as  a  whole  was  developing 
some  degree  of  proficiency  on  the  test  material.  Since  introspec- 
tions from  pupils  of  this  age  are  somewhat  misleading,  no  attempt 
was  made  to  secure  such  data. 

Tentative  Conclusions  from  Tests  Using  the  Schoen  Technique 

From  the  two  test  administrations,  on  the  college  level  and  on 
the  elementary  school  level,  certain  observations  seemed  worth 
noting. 

Comments  of  college  students  suggested  the  following: 
1.  Paired  intervals  when  given  in  sequence,  as  in  the  Schoen  test, 
presented  certain  suggestions  of  a  melodic  nature,  the  consciousness 
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of  which  tended  to  detract  from  the  attention  necessary  for  the 
discrimination  of  interval  or  pitch  differences. 

2.  In  a  test  where  the  response  is  either  one  of  two  choices 
(smaller  or  larger  in  this  instance),  the  error  due  to  guessing  was 
felt  to  be  too  large.  Many  students  stated  that  considerable  num- 
bers of  paired  intervals  sounded  the  same,  and  in  those  instances 
responses  were  in  the  nature  of  guesses. 

3.  The  arrangement  of  intervals  in  melodic  sequence,  and  the 
directions  of  the  test  for  responses  in  terms  of  largeness  or  smallness 
of  the  interval,  tended  to  draw  attention  to  pitch  distance,  and  to 
detract  from  interval  quality. 

4.  When  listening  to  intervals  in  sequence,  certain  students  in- 
vented devices  to  obtain  their  answers.  A  common  device  reported 
was  the  thinking  of  intervals  in  terms  of  their  component  scale 
steps,  the  interval  being  ''spelled"  by  means  of  the  do-re-mi 
syllables  in  order  to  determine  the  total  distance.  This  mechanical 
device  was  felt  to  vitiate  any  test  of  interval  discrimination  insofar 
as  it  might  purport  to  measure  qualitative  perception  of  musical 
intervals,  or  even  pitch  distance  for  that  matter. 

Results  of  grade  school  testing  suggested  the  following  observa- 
tions : 

1.  The  interval  of  an  octave  could  not  always  be  distinguished 
from  the  perfect  fifth  by  sixth-grade  children,  at  least  when  pre- 
sented in  melodic  sequence.  Consequently  this  type  of  interval 
discrimination  could,  if  necessary,  be  made  the  basis  of  a  test  situ- 
ation with  a  fair  assurance  of  revealing  an  appreciable  range  of 
pupil  ability  on  this  function. 

2.  The  ability  to  discriminate  between  octaves  and  fifths  could 
be  improved  within  the  limits  of  two  or  three  test  periods. 

Suggested  Patterns  for  Subsequent  Testing 

1.  Harmonic  (simultaneous)  presentation  of  intervals  was  to  be 
preferred  to  melodic  (sequential)  presentation,  because  increased 
attention  to  interval  quality  was  secured,  and  because  such  presenta- 
tion tended  to  reduce  distraction  due  to  aroused  melodic  suggestions. 

2.  Some  feature  of  a  learning  situation  could  profitablj^  be  in- 
corporated into  a  testing  situation  to  help  offset  the  advantage  of 
previous  musical  experience  of  certain  pupils. 

3.  It  would  be  possible,  at  least  for  children  in  elementary 
schools,  to  construct  tests  based  on  the  recognition  of  octaves  and 
perfect  fifths. 
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A  "Test  and  Learning"  Experiment 

Based  upon  clues  derived  from  the  study  of  the  Schoen  testing 
technique,  a  ' '  test  and  learning ' '  situation  was  devised  for  use  with 
elementary  school  children.  A  small  portable  organ  was  used  for 
the  testing.  The  organ  had  the  advantage  of  portability  and 
rendered  uniform  the  nature  of  the  stimulus.  Its  sustained  tone 
made  possible  the  holding  of  the  intervals  for  a  longer  duration  than 
would  be  possible  on  the  piano,  thereby  contributing  to  a  greater 
awareness  of  interval  quality.  The  organ  had  a  further  advantage 
in  that  its  tone  was  less  familiar  to  all  taking  the  test.  Persons 
familiar  with  the  sound  of  a  piano  were  therefore  placed  on  a 
more  equal  footing  with  those  having  no  experience  with  the  sound 
of  a  piano. 

Purjyose  of  the  Experiment 

The  general  purpose  of  the  test  and  learning  experiment  was  to 
test  the  recognition  of  a  specific  interval  before  and  after  drill  ex- 
ercises on  that  interval.  The  interval  used  for  this  first  test  was  the 
octave.^     The  test  may  be  described  as  follows : 

a.  Presentation  of  the  interval  to  ie  learned 

Six  examples  of  the  interval  were  played,  the  examples  were 
sung  by  the  class,  and  the  pupils  were  asked  to  remember  the  sound 
of  the  interval  to  be  studied. 

b.  Presentation  of  a  series  of  forty  intervals  (first  series) 

Ten  of  the  forty  intervals  were  examples  of  the  specific  interval 
under  consideration.  Intervals  were  presented  in  four  groups  of 
ten  each.  Pupils  were  asked  to  indicate  on  test  sheets  whether  each 
interval  as  it  was  played  was  the  same  as  the  interval  to  be  re- 
membered, or  a  different  interval. 

c.  Presentation  of  a  learning  series  (second  series) 

The  same  task  as  in  the  first  series  was  required,  but  intervals 
were  presented  in  progressively  larger  groups  starting  with  three 
and  ending  with  eight.  Each  group  was  preceded  by  an  example 
of  the  interval  to  be  remembered.  The  correct  answers  to  all  items 
in  this  series  were  revealed  to  the  pupils  to  allow  them  to  substantiate 
the  accuracy  of  their  own  responses. 


2  Another  test  using  the  perfect  fifth  for  recognition  was  also  used,  but  since 
the  results  do  not  contribute  materially  to  the  present  discussion  they  are  not 
included  in  this  account. 
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d.  Presentation  of  the  original  forty  intervals   (third  series)   in 
changed  order 

Results  of  the  Experiment 

Results  on  the  octave  test  were  studied  for  two  purposes.  First, 
to  discover  types  of  intervals  most  often  mistaken  for  the  octave. 
Second,  to  note  possible  gains  or  losses  in  discriminating  power  for 
each  type  of  interval  after  the  drill  or  "learning"  series  had  been 
given.  Data  from  a  class  of  thirty-seven  pupils  in  the  seventh  grade 
are  presented. 

Table  I  represents  mean  errors  for  the  entire  class  on  the  first 
and  third  series  combined.  Intervals  are  ranked  according  to 
difficulty.  An  inspection  of  the  errors  on  this  test  throws  some 
light  on  the  two  factors  of  interval  apprehension  of  pitch  distance 
and  interval  quality.  Comparing  the  three  most  difficult  intervals 
with  the  three  least  difficult,  it  will  be  noted  that  the  sum  of  the 
pitch  differences  of  the  three  most  difficult  is  twelve  half  steps  as 
compared  with  a  sum  of  only  four  for  the  least  difficult  intervals 
(omitting  reference  to  the  octave  itself). 

Pitch  distance,  therefore,  appears  not  to  be  the  sole  factor  in 
such  discrimination,  for  if  it  were  we  should  expect  errors  to  in- 

TABLE  I 

Errors  in  the  Eecognition  of  Octaves  tor  the  Combined  First  and  Third 

Series  of  the  Octave  Eecognition  Test,  Based  on  the 

Scores  of  37  Seventh-Grade  Pupils 

Difference  in  w      h       f 

Pitch  Distance  Total  Num-  rr-         t  ^        7  i,r 

Type  of  Interval     from  the  Octave  her  of  Times  Interval  Mean 

in  Terms  of  Errors          ^Vpe^iredin  Error* 


Half  Steps 


Both  Series 


Major  6th  3  119  10  .31 

Major  10th  4  85  8  .29 

Perfect  5th  5  42  6  .19 

Minor  6th  4  55  8  .19 

Minor  7th  2  41  6  .18 

Minor  10th  3  34  6  .15 

Major  9th  2  32  6  .14 

Octave  0  .    107  20  .14 

Minor  9th  1  18  6  .08 

Major  7th  1  10  4  .07 

80 


*  The  mean  error  was  computed  by  dividing  the  total  number  of  errors  for 
each  interval  by  the  total  number  of  trials.  Thus  for  the  major  sixth  the  total 
number  of  errors,  119,  was  divided  by  the  total  trials,  370  (37  x  10),  the  quotient 
being  .31. 
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crease  as  the  pitch  distance  of  the  interval  approached  the  octave. 
Almost  the  reverse  is  true  in  these  data,  suggesting  that  interval 
discrimination  might  be  accomplished  more  through  recognition  of 
qualit}^  differences.  It  seems  reasonable  to  suppose  that  errors  on 
the  most  difficult  intervals,  the  major  sixth,  the  major  tenth,  and 
the  perfect  fifth  occurred  because  of  similarity  of  interval  quality 
with  the  octave,  notwithstanding  the  large  differences  in  pitch 
distance.  Similarly  we  may  credit  the  relative  ease  in  distinguishing 
the  major  seventh,  the  minor  ninth,  and  the  major  ninth  to  differ- 
ences in  interval  quality,  notwithstanding  the  small  differences  in 
pitch  distance  from  the  octave. 

Table  II  presents  a  comparison  of  errors  between  the  -first  and 
third  series  of  intervals.  In  the  table  the  intervals  are  ranked 
simply  according  to  size.  The  third  series  represents  response 
after  the  drill  or  second  series.     The  last  column,   representing 

TABLE  II 

Comparison  of  Errors  between  the  First  and  Third  Series  of  the  Octave 
Eecoonition  Test,  Based  on  the  Scores  of  37 

Seventh-Grade  Pupils  * 


Type  of  Interval 

Mean  Errors* 

Gain  or  Loss 
in  Discrimi- 
nation 

First  Series 

Third  Series 

Perfect  5th              

.      .         .17 

.21 

.17 
.21 
.27 
.04 
.15 
.10 
.08 
.16 
.23 

-.04 

Minor  6th 

Major  6th 

Minor  7th 

Major  7th 

Octave 

Minor  9th     

20 

43 

10 

10 

14 

06 

+  .03 
+  .22 
-.17 
+  .06 
-.01 
-.04 

Major  9th 

Minor  10th 

21 

14 

+  .13 
-.02 

Major  10th 

34 

+  .11 

*  Any  apparent  discrepancies  between  the  errors  in  Tables  I  and  II  result 
from  the  rounding  off  of  figures. 

differences  in  discriminating  ability  in  the  third  series  over  the  first 
series,  revealed  the  rather  disquieting  fact  that  while  there  seemed 
to  be  an  improvement  on  some  intervals,  there  was  relatively  little 
difference  on  at  least  six  intervals,  and  a  noticeable  loss  on  the  part 
of  one.  No  attempt  was  made  to  test  the  significance  of  these  gains 
or  losses  in  discrimination,  but  it  was  recognized  that  the  learning 
situation  as  set  up  was  not  producing  the  consistent  results  which 
had  been  hoped  for.  There  was  no  way  to  account  for  the  ap- 
parent losses  for  certain  intervals,  although  obviously  they  repre- 
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sented  increasing  confusion  of  these  intervals  with  the  octave  in  the 
third  series  over  the  first. 

Tentative  Conclusions 

From  the  particulars  derived  from  the  study  of  Tables  I  and  II, 
certain  observations  were  made. 

1.  Objective  evidence  lent  some  support  to  previous  subjective 
observations  on  the  importance  of  interval  quality  when  compared 
with  pitch  distance  in  the  discrimination  of  intervals. 

2.  Given  an  appropriate  selection  of  intervals  on  the  basis  of  / 
difficulty,  and  omitting  the  "learning"  series,  it  seemed  possible  to 
construct  tests  of  this  type  which  would  reveal  a  wide  spread  of 
ability,  provided  other  aspects  of  this  test  procedure  were  satis- 
factory. 

3.  These  data  did  not  furnish  sufficient  evidence  to  decide 
whether  the  differences  in  gain  were  due  to  chance  or  to  some  real 
difference  in  the  extent  to  which  discrimination  of  the  octave  could 
be  learned.  If  such  real  differences  existed,  a  test  of  learning  could 
be  made  practicable  by  presenting  intervals  on  which  gains  could 
be  effected,  omitting  the,  more  difficult  intervals.  To  attempt  a 
learning  situation  without  a  knowledge  of  these  differences  was  to 
risk  a  serious  flaw  in  the  technique  of  testing. 

4.  The  long  span  of  memory  retention  made  necessary  by  the 
continuous  presentation  of  forty  items  might  well  have  accounted 
for  the  lack  of  appreciable  gains  in  the  ability  for  individual  inter- 
vals in  the  test. 

Suggested  Patterns  for  Succeeding  Tests 

1.  Continue  the  simultaneous  (harmonic)  presentation  of  in- 
tervals. This  appeared  desirable  because  of  the  stress  on  qualitative 
aspects  of  intervals  when  so  presented. 

2.  Lessen  the  tax  on  memory  for  intervals.  The  objectives  of 
the  study  did  not  include  memory  but  did  include  the  ability  to 
make  differentiations  between  intervals.  As  the  test  then  stood,  the 
memory  of  a  stated  interval  to  be  learned  had  to  extend  through 
the  presentation  of  forty  items. 

3.  Increase  the  frequency  of  situations  in  which  responses  in- 
volving interval  comparisons  might  take  place.  This  suggestion 
was  an  outgrowth  of  the  preceding  one. 

4.  Construct  an  individual  test,  in  order  to  pay  close  attention 
to  types  of  perception  utilized  by  single  subjects  taking  the  test. 
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Constructing  an  Individual  Test  of  Interval  Recognition 

The  Pattern  Used 

An  individual  test  was  constructed  which  took  into  account  all 
suggestions  and  clues  so  far  developed.  Because  of  its  distinctive 
sound,  the  perfect  fifth  seemed  particularly  advantageous  for  use 
in  testing.  Provision  was  also  made  for  a  learning  situation,  a  type 
of  test  technique  found  in  tests  in  other  fields.  Examples  of  this 
procedure  were  found  in  the  Orleans-Solomon  Latin  Prognosis  Test^ 
and  the  Orleans  Geometry  Prognosis  Test.* 

Mixed  intervals  containing  some  perfect  fifths  were  used,  the 
subject  being  required  to  identify  all  perfect  fifths.  Groups  of  in- 
tervals, however,  were  restricted  to  nine,  each  group  being  preceded 
by  several  examples  of  perfect  fifths.  The  device  of  using  common 
tones  in  some  of  the  series  was  employed  similar  to  the  technique 
used  in  the  Schoen  test. 

The  test  as  a  whole  consisted  of  four  subtests,  each  containing 
three  groups  of  nine  intervals  each,  a  total  of  twenty-seven  intervals 
for  each  subtest,  and  a  grand  total  of  one  hundred  eight  intervals. 
Each  group  of  nine  intervals  was  preceded  by  examples  of  perfect 
fifths.  In  groups  where  common  tones  were  used,  the  last  example 
of  a  perfect  fifth  just  preceding  the  group  of  nine  intervals  was 
based  on  the  common  tone  used  in  the  series. 

The  four  subtests  were  organized  according  to  the  following 
classification : 

Test  1.     Presentation  of  intervals  having  a  common 

bass. 
Test  2.     Presentation  of  intervals  having  a  common 

treble. 
Test  3.     Presentation  of  intervals  having  no  common 

tone,  and  located  in  a  narrow  pitch  range. 
Test  4.     Presentation  of  intervals  having  no  common 

tone,  and  located  in  a  wide  pitch  range. 

The  obvious  purpose  of  this  classification  was  to  attempt  to  aid 
the  learner  by  means  of  the  help  aiforded  in  the  beginning  by  com- 
mon tones  and  narrow  pitch  range.  In  the  new  test,  the  more 
dissonant  intervals  of  the  major  and  minor  seventh  were  first  con- 
trasted with  the  perfect  fifth,  and  toward  the  end  of  each  subtest 
the  discriminations  narrowed  down  to  more  subtle  differences  using 
major  and  minor  thirds  and  perfect  fourths. 


3  Orleans,  Jacob  S.,  and  Solomon,  Michael.     Orleans-Solomon  Latin  Prog- 
nosis Test.     World  Book  Co.,  Yonkors-on-Hudson,  N.  Y.,  1926. 

4  Orleans,  Joseph  B.,  and  Orleans,  Jacob  S.     Orleans  Geometry  Prognosis 
Test.     World  Book  Co.,  Yonkers-on-Hudson,  N.  Y.,  1929. 
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The  real  departure  of  the  individual  test  from  the  techniques  of 
former  group  tests  of  the  study  consisted  in  the  provision  that  each 
group  of  nine  intervals  in  the  subtest  was  to  be  repeated  until  the 
subject  made  two  perfect  scores.  In  a  study  by  Gordon^  this  type  of 
approach  was  utilized.  Gordon's  chief  thesis  was  that  the  number 
of  repetitions  necessary  to  secure  a  perfect  performance  of  a  dic- 
tated melody  was  a  significant  index  of  a  general  musical  ability 
of  the  subject.  In  providing  a  repetition  of  the  subtest  until  two 
perfect  scores  were  earned,  the  individual  test  did  what  no  group 
test  could  do — adjust  the  learning  situation  to  individual  differences. 
The  number  of  repetitions  needed  for  perfect  response  at  each  level 
of  the  test,  it  was  felt,  might  serve  as  some  index  of  a  capacity  for 
learning.  This  concern  for  the  learner,  it  will  be  observed,  had  been 
a  motivating  force  in  the  conduct  of  the  study  from  the  beginning, 
although  this  approach  was  given  up  as  an  integral  part  of  the 
final  test. 

Administration  of  the  Individual  Test 

In  the  administration  of  the  individual  test  a  number  of  subjects 
were  secured  who  were  students  of  advanced  education  in  various 
subject  fields  other  than  music.  The  task  of  the  test  was  much  too 
simple  for  the  average  student  of  music.  Each  subject  was  en- 
couraged to  describe  the  various  mental  processes  employed  during 
the  taking  of  the  tests.  Ten  case  studies  of  adults  and  ten  of 
seventh-grade  pupils  were  conducted.  Of  the  latter  ten,  five  were 
chosen  from  among  the  highest  scorers  on  previous  tests  of  interval 
discrimination,  and  five  from  among  the  lowest.  No  attempt  was 
made  to  keep  the  technique  of  testing  uniform,  the  object  being  to 
adjust  and  orient  each  person  to  the  conditions  imposed  by  the  test. 
Some  persons  rarely  got  beyond  the  first  few  groups  of  intervals, 
and  no  one  completed  the  entire  test.  Individual  results  on  various 
portions  of  the  test  were  consequently  not  comparable,  and  are  not 
reported.  The  various  descriptions  and  introspections  of  the  sub- 
jects as  they  attempted  to  meet  new  situations  were  of  considerable 
interest,  and  the  report  of  their  responses  was  organized  according 
to  types  of  response.  Some  experiences  were  common  to  several 
persons ;  some  held  for  only  one  person ;  and  in  other  instances  there 
was  an  overlapping  of  experiences.  The  evidence  which  follows  is 
not  intended  as  convincing  data,  but  is  presented  simply  as  partial 
support  and  confirmation  of  some  of  the  assumptions  and  interpre- 
tations of  findings  developed  earlier  in  the  study. 

5  Gordon,  Kate.  ' '  Some  Tests  on  the  Memorizing  of  Musical  Themes. ' ' 
Journal  of  Experimental  Psychology,  Vol.  2,  1917,  pp.  93-99. 
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Digest  of  Case  Studies  Using  the  Individual  Test 

An  authority  in  advanced  speech,  who  scored  well  on  the  test, 
discovered  she  was  using  the  same  type  of  listing  technique  which 
she  employed  in  analyzing  the  vocal  quality  of  speech  students. 
This  statement  would  support  the  observations  already  made  that 
perception  of  differences  in  intervals  is  based  chiefly  on  a  discrimi- 
nation of  qualitative  differences,  and  that  the  chief  function  of  im- 
portance in  the  discrimination  of  intervals  is  this  qualitative  aspect 
of  response. 

The  assistance  rendered  by  the  common  tones  seemed  to  be 
utilized  particularly  by  persons  whose  errors  and  number  of  repeti- 
tions necessary  for  successful  response  suggested  a  low  degree  of 
aptitude.  Several  of  these  subjects  stated  that  in  groups  containing 
these  common  tones  the  identification  of  the  fifths  was  effected 
through  the  functioning  of  a  memory  for  the  pitch  of  the  two  tones 
of  the  fifth  which  had  been  presented  in  the  example  just  previous 
to  the  test  series.  The  pitch  imagery  of  the  example  apparently 
facilitated  the  recognition  of  the  fifths  in  that  particular  test  group. 
To  such  persons  the  common  tones  provided  a  measure  of  security 
and  confidence.  The  feeling  of  security  dropped,  and  the  conscious- 
ness of  interval  differences  tended  to  lessen  greatly,  when  intervals 
having  no  tones  in  common  were  presented.  With  one  adult  a 
feeling  of  distinct  frustration  developed,  sufficient  to  discourage 
him  from  further  effort. 

However,  if  the  identification  was  wrong,  and  the  subject  mis- 
took another  interval  for  a  perfect  fifth,  he  chose  all  other  intervals 
sounding  like  the  first  mistaken  identification  as  perfect  fifths, 
causing  himself  to  be  hopelessly  confused  until  the  series  was  re- 
peated with  fresh  impressions  of  the  perfect  fifth  in  the  examples. 

This  account  of  persons  apparently  weak  in  interval  recognition 
is  given  at  length  because  from  this  material  there  would  appear  to 
be  a  strong  suggestion  that  the  inclusion  of  common  tones  in  any 
test  situation  of  this  kind  introduced  an  element  of  response  associ- 
ated with  pitch  memory  which  was  somewhat  foreign  to  the  function 
which  had  been  selected  for  study.  This  response  apparently  played 
little  part  in  the  recognition  of  the  interval  as  a  tonal  quality,  at 
least  within  the  short  span  covered  by  the  test.  Moreover,  some  of 
the  subjects  reported  that  they  made  better  headway  on  the  signifi- 
cant aspects  of  the  test  when  the  common  tones  were  removed. 

One  subject  indicated  that  the  interval  of  the  fifth  ought  to  be 
changed  to  another  interval  because  it  was  possible  for  him  to  sing 
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a  triad  mentally,  as  do-mi-sol,  and  arrive  in  this  way  at  an  identifi- 
cation of  the  perfect  fifth.  Here  again  was  a  situation  similar  to 
one  previously  uncovered,  where  a  certain  type  of  musical  spelling 
was  invented.  In  this  case  a  chord-wise  spelling  was  used.  The 
subject  apparently  knew  just  enough  about  music  to  do  this,  but 
lacked  the  ready  perception  to  identify  a  fifth  upon  any  other  basis. 
The  results  of  these  case  studies,  in  so  far  as  they  brought  to  light 
the  somewhat  independent  functioning  of  the  factors  of  interval 
relationship  and  memory  for  pitch,  offer  a  suggestion  for  further 
study  of  the  relative  importance  of  each  function  in  practical 
musical  experience. 

Determining  the  Psychological  Aspects  of  Response 

A  synthesis  of  the  many  data  which  had  accumulated  did  more 
than  suggest  a  pattern  for  a  test.  "With  a  fair  degree  of  certainty 
it  was  possible  to  venture  into  the  psychological  aspects  of  response 
and  to  describe  the  conditions  under  which  successful  measurement 
of  the  discrimination  of  intervals  could  take  place. 

Pitch  imagery,  or  memory  for  single  tones,  appeared  to  play 
little  part  in  the  function ;  in  fact,  certain  aspects  of  this  function, 
so  far  as  could  be  determined,  actually  stood  in  the  way  of  success- 
ful performance  on  some  of  the  tests.  The  function  of  memory, 
whether  for  pitch  or  for  intervals,  was  not  the  basic  response  sought 
for,  regardless  of  its  undoubted  importance  through  all  musical 
experience.  This  consideration  of  the  function  of  memory  brought 
the  whole  study  to  a  sort  of  crossroads.  If  the  investigation  were 
to  be  concerned  with  the  improvability  of  a  function,  it  would  natu- 
rally need  to  consider  memory  as  an  intrinsic  and  distinct  factor  in 
the  process  of  learning.  However,  it  has  been  noted  that  if  improve- 
ment were  to  take  place  on  the  test,  the  selection  of  material  would 
have  to  be  limited  to  those  intervals  on  which  it  could  be  proved 
real  learning  might  take  place.  This  would  tend  to  place  a  limita- 
tion on  the  choice  of  the  intervals  which  could  be  included  in  such 
a  test.  It  seemed  advisable,  therefore,  to  drop  any  attempt  to  in- 
clude a  learning  situation  in  favor  of  testing  the  ability  for  dis- 
crimination on  a  wide  selection  of  intervalic  material. 

Inferences  for  future  testing,  derived  from  the  results  of  these 
latest  considerations,  were  outlined  as  follows: 

a.  Present  intervals  in  various  pitch  ranges,  having 

no  notes  in  common,  thereby  greatly  reducing 
the  suggestions  for  pitch  imagery. 

b.  Reduce  suggestions  of  melodic  pattern  to  a  mini- 

mum. 
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c.  Reduce  the  memory  span  of  attention. 

d.  Increase  the  number  of  qualitative  observations  of 

contrasting  interval  quality. 

These  considerations  were  instrumental  in  the  devising  of  a 
multiple-response  test  item  which  was  the  next  development  in  the 
study. 

The  Multiple-Response  Test  Item 

Pattern  of  the  Item 

From  the  conditions  suggested  by  previous  experimentation,  a 
short  test  item  was  constructed,  consisting  of  the  sounding  of  four 
intervals,  three  of  which  were  the  same  type  of  interval,  while  the 
remaining  one  was  different.  Intervals  were  placed  on  different 
pitch  levels,  and  an  attempt  was  made  not  to  duplicate  any  notes 
within  a  given  item.  The  subject's  task  was  to  detect  the  interval 
which  was  different.  Response  on  each  item  was  made  in  terms  of 
the  position  of  this  different  interval  in  the  sequence  of  the  four 
intervals,  whether  first,  second,  third,  or  fourth.  All  tests  from  this 
stage  to  the  end  of  the  study  utilize  this  technique  of  testing.  In 
subsequent  discussion,  therefore,  the  interval  which  was  used  three 
times  to  provide  the  foundation  for  the  discrimination  of  the  differ- 
ent interval  is  referred  to  as  the  Itasic  interval  of  the  test  item.  The 
interval  appearing  once,  and  upon  which  the  recognition  of  inter- 
valic  difference  was  to  be  focused,  is  referred  to  as  the  contrasted 
interval. 

The  construction  of  this  multiple-response  item  satisfies  a  num- 
ber of  conditions.  Its  relatively  short  span  permits  a  number  of 
separate  test  situations  of  the  ability  to  make  quality  discrimina- 
tions, and  subsequent  tests  used  between  forty  and  ninety  of  such 
items,  depending  upon  the  maturity  of  the  groups  and  the  speed 
with  which  the  items  could  be  given.  The  memory  span  needed  for 
any  one  observation  is  greatly  reduced  over  previous  tests.  There  is 
comparatively  little  melodic  suggestion  to  the  hearer.  Most  impor- 
tant, the  attention  on  differences  in  interval  quality  is  brought  to 
the  fore,  through  the  use  of  different  pitch  placement  of  all  intervals. 

Description  of  Tests  Using  the  Multiple-Response  Test  Item 

Various  forms  of  interval  tests  were  constructed  in  order  to 
satisfy  certain  objectives  of  study.  There  were  three  distinct  stages 
in  the  development  of  these  tests.  In  the  first  stages  of  test  construc- 
tion interest  was  still  focused  on  providing  some  favorable  situation 
for  orienting  the  subject  to  the  conditions  of  the  test  with  the  possi- 
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bility  of  some  learning  or  improvement.  There  was  particular  con- 
cern for  persons  with  no  previous  experience  in  music.  In  the  first 
tests  of  this  kind,  the  basic  interval  was  uniform  throughout  the  test 
and  contrasted  intervals  were  introduced  in  series  of  ten  items  each. 
Test  forms  using  this  technique  were  labeled  Q  series,  two  of  these 
forms  being  described.  The  last  test  of  this  series  added  some  of  the 
conditions  described  in  the  second  stage  of  testing. 

In  the  second  stage  of  test  development  greater  diversity  of 
paired  comparisons  was  introduced,  and  the  basic  intervals  were 
not  kept  uniform  throughout  the  test.  Paired  intervals  of  a  given 
kind,  however,  were  presented  in  groups  of  five  items  each.  Tests 
of  this  series  were  labeled  the  R  series,  and  one  form  of  this  kind  is 
described. 

In  the  third  stage,  after  preliminary  tests  indicated  the  feasi- 
bility of  the  method,  paired  intervals  were  not  only  introduced  in 
a  profuse  variety,  but  were  changed  completely  for  each  successive 
item.  The  tests  of  this  kind  were  labeled  the  T  series,  and  four  forms 
of  this  kind  are  described. 

A  description  of  test  forms  follows : 


Form  3. 


First  Stage:  The  Q  Series 

Consisted  of  sixty  items  presented  in  six  groups  of  ten  each,  using 
the  following  interval  comparisons : 

Group  Basic  Interval  Contrasted  Interval 

1.  Perfect  5th  Minor  7th 


2. 
3. 

4. 
5. 
6. 


Octave 

Diminished  5th 
Major  6th 
Major  3rd 
Perfect  4th 


Form  9.     Consisted  of  ninety  items  in  eighteen  groups  of  five  each,  using  the 
following  interval  comparisons: 


Group 

1. 
2. 
3. 

4. 
5. 
6. 

7. 

8. 

9. 
10. 
11. 
12. 
13. 
14. 
15. 
16. 
17. 
18. 


Basic  Interval 
Minor  7  th 
Perfect  5th 
Perfect  5th 
Perfect  5th 
Perfect  4th 
Major  6th 
Perfect  5th 
Major  6th 
Perfect  4th 
Perfect  4th 
Perfect  5th 
Minor  7th 
Major  6th 
Perfect  5th 
Major  6th 
Minor  7th 
Minor  7th 
Perfect  4th 


Contrasted  Interval 

Perfect  5th 
Major  3rd 
Major  6th 
Diminished  5th 
Diminished  5th 
Minor  7th 
Minor  7th 
Perfect  5th 
Minor  6th 
Minor  3rd 
Octave 
Major  6th 
Perfect  4th 
Perfect  4th 
Diminislied  5th 
Diminislicd  5th 
Major  6th 
Perfect  5th 
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Second  Stage :  The  E  Series 

Form  1.  Consisted  of  fifty  items.  It  contained  additional  interval  combina- 
tions not  found  in  previous  forms.  It  provided  ten  groups  of  five 
each,  using  the  following  comparisons: 

Group  Basic  Interval  Contrasted  Interval 

1.  Minor  7th  Minor  6th 

2.  Perfect  5th  Minor  7th  and  Major  7th 

3.  Minor  7th  Diminished  5th 

4.  Perfect  5th  Diminished  5th 

5.  Perfect  4th  Minor  3rd 

6.  Perfect  4th  Major  6th 

7.  Major  6th  Perfect  4th 

8.  Perfect  5th  Perfect  4th 

9.  Perfect  5th  Major  6th 
10.  Major  6th  Minor  6th 

Third  Stage :  The  T  Seriess 

Forms  1,  2,  and  3.     These  were  three  preliminary  forms  of  a  test  embodying 
a    definite    experimental    design.     Each    form    contained 
forty-eight    items.     Throughout   all    three    forms    selected 
and  systemized  factors  were  randomized,  in  order  to  pro- 
vide a  maximum  distribution  of  possible  interactions   of 
factors. 
Form  4.     This  was  the  final  test  form.     It  consisted  of  fifty  items,  each  item 
constructed  of  factors  selected  on  the  basis  of  study  of  Forms  1,  2, 
and  3  of  the  same  series. 

Tentative  Validity  and  Reliability  Coefficients 

It  was  necessary  at  the  earliest  possible  opportunity  to  determine 
the  relationship  which  existed  between  scores  on  some  of  the  tests 
and  criteria  of  musical  ability.  Retests  were  also  carried  out  to 
determine  the  reliability  of  the  test  techniques  adopted.  Some  in- 
terest was  centered  on  the  relationship  with  certain  existing  tests  in 
music,  and  with  measures  of  general  intelligence. 

Studies  on  the  Secondary  School  Level 

Initial  tests  using  unselected  groups  in  junior  high  schools  cor- 
related very  little  with  teacher  estimates  of  pupil  ability.  This  was 
not  totally  unexpected,  since  there  were  no  specialized  activities  in 
music,  and  teachers  had  heavy  teaching  loads,  making  it  difficult  for 
them  to  give  adequate  ratings  on  pupil  ability.  However,  an  exami- 
nation of  the  range  of  test  scores  indicated  that  an  acceptable  distri- 
bution of  scores  was  being  obtained. 

With  this  latter  assurance,  the  Qs  test  was  administered  to  the 
entering  class  of  161  students  of  music  (ninth  academic  grade)  of 
the  Music  and  Art  High  School  in  New  York  City.  Tests  were 
administered  before  any  instruction  had  been  given.    At  the  end  of 


6  A  copy  of  this  detailed  material  has  been  deposited  in  the  Psychology 
Library,  Columbia  University. 
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the  semester  various  music  marks  were  obtained  on  these  students 
and  correlated  with  their  scores  on  the  test.  These  correlations 
helped  at  the  time  to  provide  information  on  the  nature  of  the  valid- 
ity likely  to  exist  when  more  refined  measures  of  both  test  scores 
and  teacher  ratings  were  obtained.  Correlations  with  various  mea- 
sures of  ability  in  dictation  were  +  .59,  +  .53,  +  .45,  +  .39,  and  +  .32, 
using  classes  of  thirty-two,  thirty-nine,  forty-one,  fifty-three,  and 
forty-three,  respectively.  Two  correlations  with  sight-singing  were 
+  .35  and  -i-  .46  for  forty-two  and  forty-three  students  respectively. 
Eesults  from  both  of  these  correlations  suggested  that  here  were  two 
activities  worthy  of  study  in  connection  with  future  tests.  Correla- 
tions with  grades  in  written  theory  were  only  +  .13  and  +  .16,  casting 
doubt  on  the  use  of  the  tests  for  prognosis  of  this  ability. 

Another  incidental  study  was  carried  out  at  the  Horace  Mann 
School  in  New  York  City.  Scores  on  the  Ri  test  were  correlated  with 
teacher  ratings  on  an  objective  of  instruction  designated  as  tonal 
learning.  A  description  of  this  objective  is  presented  in  the  fourth 
chapter  in  connection  with  the  validation  of  the  final  test.  Correla- 
tions with  estimates  by  the  instructor  were  +  .44  for  thirty-six 
seventh-grade  students,  +  .89  for  twenty-two  eighth-grade  students, 
and  +  .65  for  thirty-nine  ninth-grade  students. 

A  difference  in  the  validation  procedures  for  the  two  schools 
should  be  noted.  Measures  of  student  ability  at  the  Music  and  Art 
High  School  were  assembled  at  the  end  of  the  semester.  The  cri- 
terion used  at  the  Horace  Mann  School  consisted  of  measures  of  the 
students'  ability  at  the  time  of  taking  the  tests.  However,  the  cri- 
teria are  not  comparable,  since  they  do  not  bear  the  same  definition 
nor  are  they  derived  from  the  same  instructor.  The  relationships 
in  both  schools  were  considered  of  importance  in  the  validation  of 
the  tests  since  the  criteria  were  obtained  from  objectives  of  instruc- 
tion. 

A  Study  on  the  College  Level 

Another  incidental  study  was  carried  out  in  a  class  of  eighteen 
in  chromatic  harmony  at  Teachers  College,  Columbia  University. 
Students  taking  the  test  were  ranked  on  the  basis  of  teacher  esti- 
mates of  ability  in  the  subject  and  also  on  a  composite  of  class  marks. 
Using  a  rank  order  method  the  correlations  were  +  .58  and  +  .57  for 
the  two  criteria. 

Retests  on  Preliminary  Forms 

Retests  on  the  Q3  form  showed  a  reliability  of  +  .72  for  seventy- 
nine  cases  retested  within  a  period  of  seven  days,  using  pupils  from 
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grades  seven  and  eight.  Retests  on  the  same  form  for  grades  six 
and  seven  after  six  months  revealed  a  correlation  of  +  .64,  using 
fifty-seven  pupils. 

Correlations  with  Measures  of  Intelligence 

Always  of  interest  in  connection  with  any  new  test  of  ability  is 
the  relationship  to  measures  of  general  intelligence.  Correlation 
between  the  Q3  test  scores  and  the  Stanford-Binet  test  for  ninety-six 
pupils  in  the  seventh  and  eighth  grades  was  +  .13.  On  the  college 
level  the  correlation  between  the  Q3  test  and  scores  on  the  Otis  Self- 
Administering  Intelligence  Test  (Higher  Form)  for  sixteen  stu- 
dents was  +  .23.  These  correlations  are  not  significantly  different 
from  zero. 

Summary 

The  present  chapter  has  presented  the  steps  taken  in  the  develop- 
ment of  testing  techniques.  It  has  presented  and  discussed  the 
various  psychological  aspects  of  response  which  appeared  to  be 
crucial  factors  in  the  general  function  of  interval  discrimination. 
Tentative  validity  and  reliability  coefficients  have  been  reported, 
and  the  relationship  of  the  function  to  some  intelligence  tests  has 
been  indicated.  A  vehicle  in  the  form  of  a  multiple-response  test 
item  has  been  described.  Reference  has  been  made  to  an  experimen- 
tal design  for  a  final  test  form,  using  the  multiple-response  test  item 
as  a  basis  for  testing.  In  the  following  chapter  a  detailed  descrip- 
tion of  the  work  of  developing  and  refining  test  items  is  presented. 


CHAPTER  III 

ATTEMPTS  AT  REFINEMENT  OF  THE  TEST 

INSTRUMENT 

This  chapter  presents  a  discussion  of  the  means  undertaken  to 
construct  a  test  of  interval  discrimination  with  as  much  reliability 
and  validity  as  a  study  of  item  validity  could  provide.  In  a  practi- 
cal way  this  amounted  to  a  study  of  the  means  for  the  selection  and 
presentation  of  the  four  intervals,  three  basic  and  one  contrasted,  of 
the  multiple-choice  test  item  discussed  in  the  previous  chapter.  This 
study  was  carried  out  by  means  of  an  experimental  design  which 
was  incorporated  into  three  test  forms.  The  account  of  the  con- 
struction of  this  design,  the  administration  of  the  tests,  the  analysis 
of  results,  and  the  construction  of  a  final  test  form  constitutes  the 
material  of  this  chapter. 

Fundamental  Objectives  and  Review  of  Progress 

The  gist  of  the  thesis  of  the  entire  study,  as  outlined  in  the  first 
chapter,  has  been  that  the  ability  to  differentiate  between  musical 
intervals  appeared  to  be  such  an  integral  part  of  larger  and  more 
complex  activities  in  music  of  a  tonal  nature  that  a  measure  of  the 
former  might  well  serve  as  a  means  for  estimating  the  latter.  The 
first  two  chapters  have  presented  a  description  of  and  justification 
for  preliminary  research  which  was  largely  psychological  in  nature. 
This  exploratory  work  led  first  to  the  establishment  of  what  ap- 
peared to  be  the  most  significant  aspects  of  response  on  the  function 
of  interval  discrimination  worthy  of  use  in  later  tests.  In  addition, 
the  multiple-response  test  item  was  devised,  which  was  designed  to 
record  in  an  objective  manner  the  type  of  response  which  the  psy- 
chological inquiry  had  helped  to  develop.  A  series  of  these  items 
constituted  a  test  purporting  to  measure  the  function  of  interval 
discrimination. 

Further  Needs  of  the  Study 

What  was  now  needed  was  a  systematic  study  of  item  validity  in 
order  that  a  final  test  form  might  possess  a  high  degree  of  differenti- 
ating power.  Under  the  circumstances  the  best  criterion  for  such 
validity  appeared  to  be  total  score.  There  were  two  principal 
avenues  open  for  the  study  of  item  validity.  The  first  was  to  deter- 
mine, if  possible,  types  of  interval  combination  which  yielded  high 
validity  values.    The  second  was  to  ascertain  the  relative  validity  of 
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certain  structural  patterns  of  the  multiple-response  item.  This 
called  for  a  systematic  organization  and  distribution  of  various  types 
of  interval  combination  and  item  patterns  throughout  an  experi- 
mental test  series  upon  which  a  study  of  item  validity  could  be  made. 
The  resulting  data  would  then  serve  as  valuable  reference  material 
for  the  construction  of  a  final  test  form. 

Chapter  III  is  an  account  of  this  phase  of  the  study,  starting 
with  the  setting  up  of  an  experimental  test  series  and  ending  with 
the  assembling  and  recording  of  the  final  test. 

Outline  of  Final  Procedures  in  the  Refinement  of  the 

Test  Instrument 

A  brief  outline  of  the  procedures  employed  in  the  development 
of  test  items  used  in  the  final  instrument  of  measurement  may  be  of 
help  in  following  the  many  steps  which  must  necessarily  appear  in 
a  work  of  this  kind.    The  various  divisions  are  as  follows : 

1.  Classification  of  factors 

The  means  for  presenting  sequences  of  four  intervals  in  test 
items  were  classified  into  four  general  categories,  each  of 
which  contained  from  three  to  eight  subclasses. 

2.  Construction  of  a  design  as  a  basis  for  an  experimental  test 

Into  a  design  for  a  test  went  a  random  distribution  of  sys- 
temized  factors.  The  purpose  of  this  distribution  was  to 
permit  the  greatest  possible  interaction  of  the  various  sub- 
classes as  they  operated  within  the  items  made  up  for 
testing. 

3.  Administration  of  the  experimental  test 

The  experimental  test  was  made  up  into  three  forms,  and 
was  administered  to  a  sampling  of  elementary  and  secon- 
dary school  population.  Data  from  these  tests  are  pre- 
sented. 

4.  Computation  of  hi-serial  correlations  on  test  items 

An  item  analysis  was  conducted  by  means  of  bi-serial  corre- 
lations, using  as  a  criterion  performance  on  each  test  as  a 
whole.  All  items  were  then  ranked  according  to  these 
values  from  high  to  low.  An  index  of  validity  of  the  vari- 
ous subclasses  in  the  design  was  devised  and  the  subclasses 
were  then  ranked  according  to  these  values  from  high  to  low. 

5.  Constrnction  of  a  final  test 

A  final  test  was  made  up  of  items,  some  of  which  were  used 
without  alteration  from  the  experimental  form  because  of 
their  high  validities.  Certain  practical  considerations  re- 
quired the  construction  of  new  items.  For  this  work  use 
was  made  of  ranked  lists  of  validity  values  of  the  various 
factors   of   item    construction.      Patterns    of    construction 
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drawn  upon  most  heavily  were  from  those  classes  in  which 
the  observed  validity  values  were  highest. 
6.    Administration  of  the  final  test  form 

Data  showing  the  means  and  standard  deviations  of  groups 
taking  the  final  test  are  reported. 

Although  much  could  be  learned  from  the  three  experimental 
tests,  they  could  not  provide  complete  information  for  the  construc- 
tion of  the  final  test.  For  example,  it  was  not  known  how  many  of 
the  items  in  these  forms  would  be  usable  in  the  final  test  form,  or 
how  many  would  have  to  be  constructed  on  the  basis  of  information 
developed  through  an  analysis  of  the  items  and  the  factors  of  their 
construction.  Much  depended  upon  the  difficulty  and  the  validity 
of  specific  items  as  to  whether  they  could  be  used  without  alteration. 
In  the  construction  of  new  items  no  exact  information  on  their  diffi- 
culty or  validity  could  be  had,  although  it  appeared  possible  for  all 
practical  purposes  to  make  certain  approximations  of  these  values. 

It  will  be  noted  in  several  operations  in  the  analysis  of  data  that 
it  seemed  advisable  to  deviate  somewhat  from  the  plan  as  outlined, 
because  of  unforeseen  developments  in  the  data,  but  the  results  to 
be  obtained  from  the  altered  procedures  appeared  to  outweigh  the 
benefits  from  the  original  plan.  Moreover,  to  have  been  altogether 
sure  of  the  reliability  and  validity  of  new  items  in  the  final  test  it 
would  have  been  necessary  to  have  tried  them  out  under  actual  test 
conditions.  In  view  of  the  exploratory  nature  of  the  final  test,  this 
added  refinement  did  not  appear  to  warrant  the  effort. 

Procedures  in  the  Construction  of  the  Final  Test 

Instrument 

Classification  of  Factors 

A  study  of  the  manner  of  presenting  the  four  intervals  of  the 
multiple-choice  test  item  showed  that  there  were  four  main  factors 
which  called  for  consideration.     They  were : 

1.  Type  of  interval  combination. 

2.  Position   of  the   contrasted  interval  to  serve   as 

answer  response. 

3.  Type  of  pitch  direction  or  sequence  of  interval  pro- 

gression. 

4.  Pitch    range    of    presented    intervals — narrow    or 

wide. 

For  the  first  factor  there  were  a  large  number  of  interval  combi- 
nations available  for  use ;  more,  in  fact,  than  could  be  accommodated 
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in  several  tests.  For  reasons  of  expediency  which  will  be  explained 
later,  this  number  was  confined  to  twenty-four  pairs  of  intervals, 
which,  when  reversed  with  respect  to  tasic  and  contrasted  function, 
numbered  forty-eight  subclasses  of  this  first  category. 

Since  each  test  item  contained  four  intervals,  the  number  of 
positions  of  the  contrasted  interval  or  answer  response  which  could 
be  considered  in  this  category  was  four. 

A  study  of  possible  pitch  directions  resulted  in  setting  up  eight 
different  subclasses  for  this  category. 

Three  subclasses  of  pitch  range  were  provided,  by  which  the 
spacing  of  the  four  intervals  could  be  controlled. 

The  Mechanical  Procedure  of  the  Experiment 

Before  taking  up  the  details  of  the  selection  of  the  subclasses  in 
the  experimental  design,  the  mechanical  plan  for  the  distribution 
of  factors  should  be  explained. 

To  recapitulate,  the  numerical  distribution  of  the  various  sub- 
classes was  as  follows : 

1.  Types  of  interval  presentation 48 

2.  Position  of  contrasted  interval  (answer  num- 

ber)        4 

3.  Types  of  pitch  direction 8 

4.  Degrees  of  pitch  range 3 

Previous  experience  in  giving  these  tests  had  sho-s\Ti  that  fifty 
items  constituted  a  convenient  number  for  presentation  at  any  one 
time.  Since  each  item  was  always  repeated  in  testing,  this  meant  in 
reality  that  one  hundred  items  were  played.  For  mathematical  con- 
venience in  the  experimental  test  this  number  was  reduced  to  forty- 
eight.  This  number  of  items  not  being  sufficient  for  the  purpose  of 
the  experiment,  three  forms  were  planned  which  then  would  provide 
144  items  for  study. 

In  the  mechanical  conception  of  the  experimental  design  the 
forty-eight  items  of  each  of  the  three  tests  were  considered  as  forty- 
eight  separate  plots  on  which  were  distributed,  at  random,  equal 
numbers  of  all  of  the  subclasses  decided  upon.  The  construction  of 
items  was  based  entirelj^  upon  a  set  of  specifications  which  were 
automatically  assigned  to  each  plot  or  item.  For  each  test  form 
forty-eight  4x6  cards  were  laid  out.  First  the  specifications  for 
each  of  the  forty-eight  subclasses  of  the  first  category  (interval  com- 
bination) were  written  down  on  the  forty-eight  cards  or  plots.  For 
the  second  category  (position  of  contrasted  interval)  twelve  sets  of 
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the  four  answer  positions  were  written  down  on  smaller  cards, 
which  were  then  shuffled  and  placed  on  the  forty-eight  larger  cards. 
This  was  followed  by  the  distribution  of  the  pitch-degree  patterns 
consisting  of  six  sets  of  the  eight  subclasses.  The  distribution  of 
sixteen  sets  of  the  three  space  ranges  then  completed  the  task. 

This  entire  procedure  was  repeated  for  the  other  test  forms.  In 
the  end  there  were  144  sets  of  specifications,  all  representing  poten- 
tial test  items.  Each  plot,  or  card,  contained  four  types  of  specifi- 
cations for  the  construction  of  an  individual  test  item.  The  first 
specification  listed  the  type  of  interval  combination  to  be  used.  The 
second  specification  dictated  the  answer  position  of  the  contrasted 
interval.  The  third  specification  showed  which  pitch  direction  the 
intervals  were  to  take,  and  the  last  specification  controlled  the  spac- 
ing of  the  intervals  one  from  another.  The  entire  mechanical  pro- 
cedure, therefore,  tended  to  permit  the  freest  possible  interaction  of 
all  factors  included  in  the  design. 

The  Selection  of  Interval  Combinations  in  the  Experi- 
mental Design 

If  the  main  thesis  of  the  study  were  to  be  strictly  followed,  a  test 
of  interval  discrimination  would  have  to  test  ability  to  distinguish 
between  all  possible  pairs  of  intervals.  This  is,  of  course,  quite 
impractical.  Even  if  the  selection  were  confined  to  the  eleven 
intervals  available  within  the  octave,  fifty-five  paired  interval  com- 
parisons would  result.  This  number  is  raised  to  one  hundred  and 
ten  if  use  is  made  of  each  interval  according  to  its  basic  and  con- 
trasted function  in  the  multiple-response  item.  Eliminating  the 
minor  and  major  seconds  because  of  their  pronounced  dissonance 
and  ease  of  recognition,  the  use  of  the  remaining  nine  intervals  still 
produces  thirty-six  pairs  of  intervals  or  seventy-two  types  of  inter- 
val treatment  in  all.  The  experimental  test  form  could  not  con- 
veniently include  all  these  and  at  the  same  time  provide  sufficient 
numbers  of  each  combination  upon  which  to  base  any  study  of  the 
relative  validity  of  each  type  of  combination  used. 

There  were  two  alternatives  open.  A  plan  for  random  selection 
of  all  possible  interval  combinations  could  have  been  used,  including 
intervals  larger  than  the  octave.  The  other  alternative  was  to  make 
a  selection  of  paired  intervals  based  on  what  could  be  learned  of  the 
apparent  eifect  on  item  validity  of  certain  interval  combinations. 
There  were  certain  data  which,  though  incomplete,  seemed  to  lend 
justification  to  the  selection  of  certain  types  of  intervals  and  interval 
combinations  over  others. 
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The  latter  alternative  was  the  one  adopted,  for  at  the  time  it 
seemed  better  to  use  information  which  would  give  some  promise  of 
validity  in  certain  directions  than  to  allow  the  intervals  to  be 
selected  simply  on  the  basis  of  chance  alone.  It  is  admitted  that, 
for  all  the  effort  involved  in  seeking  and  justifying  the  elimination 
of  certain  paired  intervals,  it  might  have  been  better  with  extra  work 
to  have  included  at  least  all  combinations  within  the  octave.  How- 
ever, since  the  study  did  operate  on  a  selective  basis  of  interval  use, 
it  becomes  necessary  to  note  the  procedure. 

This  discussion  may  not  entirely  be  without  its  merits,  however, 
for  from  it  may  come  a  suggestion  for  a  later  psychological  study 
of  interval  perception. 

A  research  study  by  Ortmann^  seemed  of  particular  significance 
in  connection  with  the  problem  of  the  selection  of  intervals  for  the 
experimental  test  form.  Ortmann,  in  studying  errors  of  students 
in  the  identification  of  intervals  in  classroom  work  in  musical  theory, 
found  a  large  proportion  of  errors  among  intervals  of  what  he  terms 
a  mixed  fusion  group.  The  intervals  in  question  are  the  minor 
sixth,  minor  third,  minor  seventh,  and  the  augmented  fourth  (di- 
minished fifth).  He  reports  relatively  few  errors  in  his  classifica- 
tion of  marked  consonance  intervals,  and  also  relatively  few  among 
the  dissonant  intervals  of  the  major  and  minor  second  and  the  major 
seventh. 

The  intervals  of  each  classification  and  errors  for  each  he  reports 
as  follows : 


Number 

of  Errors 

Marked  Consonances 

Perfect  octave 

3 

Perfect  5th 

52 

Major  3rd 

55 

Major  6th 

72 

Perfect  4th 

71 

253 

Mixed  Fusions 

Minor  6th 

152 

Minor  3rd 

151 

Minor  7  th 

144 

Aug.  4th   (Dim.  5th) 

125 

572 

Dissonances 

Major  2nd 

35 

Major  7th 

53 

Minor  2nd 

46 

134 


The  concentration  of  errors  in  the  mixed  fusion  category  seems 
noteworthy  and  appears  to  have  some  bearing  on  what  has  been 


1  Ortmann,  Otto,  op.  cit.,  pp.  48-69. 
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learned  of  item  validity  from  one  of  the  preliminary  test  forms  of 
the  present  study.  This  validity  study  is  discussed  presently. 
Ortmann's  report  does  not  state  whether  the  number  of  opportuni- 
ties for  error  for  these  students  were  constant  for  all  intervals. 
Furthermore,  the  classification  of  intervals  and  the  order  of  their 
ranking  are  his  own,  and  may  or  may  not  be  based  on  proven 
degrees  of  consonance,  fusion,  or  dissonance.  Nevertheless,  the  en- 
tire classification,  together  with  the  observation  of  concentration  of 
errors,  seemed  sufficiently  important  to  study  in  connection  with  the 
item  analysis  conducted  on  the  Ri  test.^ 

Computations  were  made  on  the  Ri  test  in  order  to  obtain  some 
indication  of  the  relative  validity  of  types  of  interval  combinations 
used  in  that  test.  Index  values  of  validity  for  types  of  interval 
combination  were  computed  by  averaging  the  bi -serial  correlations 
of  items  which  contained  these  types  of  combination.  These  values 
obviously  do  not  represent  any  mean  validity.^ 

Table  III  presents  the  different  interval  combinations  of  the  Ri 
test  ranked  in  the  order  of  their  index  values  of  validity.  In  the 
first  column  of  the  table  are  difference  values  in  the  degree  of  fusion 
according  to  Ortmann's.  classification.     These  values  are  computed 

TABLE  III 

Indices  op  Validity  of  Paired  Interval  Combinations  in  the  Ei  Test,  Based 
ON  the  Bi-Serial  Correlations  of  Individual  Items  Taken  from 

199  Test  Papers 

Paired  Intervals*  Tf  n  .  Pitch  Distance    Validity  Value  t 

Minor  7th  and  minor  6th 2  2  .65 

Perfect  5th  and  minor  7th  6  3  .59 

Perfect  5th  and  major  7th  8  4  .56 

Minor  7th  and  diminished  5th  ...  1  4  .52 

Perfect  5th  and  diminished  5th  7  1  .52 

Perfect  4th  and  minor  3rd  2  2  .51 

Perfect  4th  and  major  6th  1  4  .50 

Major  6th  and  perfect  4th  1  4  .45 

Perfect  5th  and  perfect  4th  3  2  .42 

Perfect  5th  and  major  6th  2  2  .43 

Major  6th  and  minor  6th  2  1  .40 

*  Interval  placed  first  was  basic ;   that  is,  played  three  times  in  the  item. 
t  Based  on  Ortmann  's  fusion  classification. 

t  Determined  by  summing  the  correlation  values  for  similar  paired  intervals 
and  dividing  by  the  number  of  such  pairs. 

2  See  p.  37. 

3  This  method  of  averaging  correlation  values  has  been  used  throughout  the 
study  of  the  various  tests  considered  in  this  chapter.  To  have  transformed  the 
r's  to  z's  before  averaging  would  have  given  a  somewhat  more  accurate  average, 
but  the  difference  would  probably  not  be  very  important. 
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by  assigning  consecutive  numerals  to  the  intervals  used  in  the  test 
and  then  noting  the  numerical  difference*  in  position  between  the 
two  intervals  of  each  combination.  In  the  second  column  of  the 
table  are  differences  in  pitch  distance  expressed  in  terms  of  half 
steps  found  between  the  two  intervals  of  each  combination  used. 

The  following  observations  are  made  without  recourse  to  a  study 
of  statistical  differences.  The  incomplete  selection  of  intervals  used 
in  the  test,  together  with  a  possible  unreliability  of  the  difference 
values  derived  from  the  Ortmann  classification,  permit  only  some 
generalized  observations. 

Examining  the  indices  of  validity  of  intervals  from  the  different 
categories  of  the  Ortmann  classification,  the  following  points  are 
noted : 

a.  Among  the  five  pairs  of  highest  validity  index, 

every  combination  contains  at  least  one  interval 
from  either  the  mixed  fusion  or  dissonance  cate- 
gories. One  combination  contains  intervals  both 
of  which  are  in  the  mixed  fusion  group. 

b.  Of  the  five  pairs  of  lowest  validity  index,  every 

combination  contains  at  least  one  interval  from 
the  marked  consonance  group,  and  in  four  of 
these  pairs  both  intervals  are  from  the  marked 
consonance  group. 

Examining  differences  in  fusion  degree  the  following  is  to  be 
noted : 

a.  In  the  five  pairs  with  high  validity  index  the  sum 

of  differences  in  fusion  degree  is  twenty-four, 
while  in  the  five  pairs  with  lowest  validity  index 
the  sum  is  only  nine. 

b.  Among  the  five  pairs  with  highest  validity  index  is 

one  combination  whose  intervals  are  only  two 
degrees  of  fusion  apart  but  both  of  which  are 
contained  within  the  mixed  fusion  group. 

Examining  the  factor  of  pitch  distance  the  following  is  noted : 

a.  In  the  five  pairs  of  highest  validity  index  are  two 
pairs  with  small  pitch  differences  of  one  and  two 
half  steps.  The  first,  the  minor  seventh  and 
minor  sixth,  has  a  small  fusion  difference  of  two 
degrees  but  both  intervals  are  in  the  mixed  fu- 
sion group.     The  second  pair  noted,  the  perfect 


*  Major  and  minor  seconds  were  not  used  in  the  Ri  test ;  hence  the  numerical 
values  do  not  compare  identically  with  Ortmann 's  complete  classification. 
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fifth  and  diminished  fifth,  with  only  one  half  step 
of  pitch  difference,  has  a  large  fusion  difference 
of  seven  degrees, 
b.  In  the  five  pairs  with  lowest  validity  are  three  pairs 
with  small  pitch  differences  of  one  and  two  half 
steps.  The  perfect  fifth  and  the  perfect  fourth 
have  three  degrees  of  fusion  difference  and  are 
both  in  the  marked  consonance  group.  The  re- 
maining two,  the  perfect  fifth  and  the  major 
sixth,  and  the  major  sixth  and  minor  sixth,  con- 
tain one  interval  each  in  the  marked  consonance 
category  and  both  pairs  are  separated  by  two 
degrees  of  fusion  difference. 

From  these  observations  on  the  intervals  used  in  the  Ki  test  the 
following  tentative  inferences  are  drawn : 

1.  Except  in  the  case  of  certain  intervals  within  the 

mixed  fusion  group,  wide  differences  in  fusion 
appear  on  the  whole  to  be  accompanied  by  higher 
validity  values. 

2.  Paired  intervals  found  within  the  mixed  fusion 

group  appear  on  the  whole  to  be  accompanied 
by  higher  validity  values  regardless  of  the  small 
fusion  difference  between  the  intervals.  The  re- 
lation between  this  situation  and  the  high  con- 
centration of  errors  in  identification  reported  by 
Ortmann  of  intervals  of  this  same  group  seem 
more  than  mere  coincidence. 

3.  Combinations  containing  intervals  both  of  which 

are  in  the  marked  consonance  groups,  and  hence 
relatively  low  in  fusion  difference,  seem  to  be 
accompanied  by  lower  indices  of  validity. 

4.  With  the  exception  of  intervals  in  the  mixed  fusion 

group,  pairs  with  both  low  pitch  difference  and 
low  fusion  difference  are  accompanied  by  low 
validities,  and  a  corresponding  higher  validity 
seems  present  when  a  low  pitch  distance  is  ac- 
companied by  a  wide  difference  in  fusion  degree. 

It  must  be  admitted  that  if  much  depended  upon  the  use  of  these 
inferences  there  would  be  serious  weakness  in  such  procedure.  How- 
ever, the  alternative  to  selection  on  this  basis  was  mere  random 
sampling,  and  it  will  be  seen  presently  that  the  final  selection  pro- 
vided for  a  considerable  diversification  of  interval  combinations. 

On  the  basis  of  these  inferences  there  seemed  some  justification 
for  establishing  a  presumptive  superiority  toward  validity  of  inter- 
vals in  the  mixed  fusion  and  dissonance  groups.     This  gave  rise 
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to  a  plan  for  the  inclusion  in  the  new  experimental  form  of  combina- 
tions selected  freely  from  these  groups.  It  was  decided  to  use  all 
combinations  made  up  of  intervals  from  the  mixed  fusion  group. 
These  numbered  six  in  all. 

Combinations  made  up  of  intervals  outside  the  mixed  fusion 
group  were  also  used  except  where  both  fusion  degree  and  pitch 
distance  were  small.  These  accepted  combinations  numbered  eight- 
een. Major  and  minor  seconds  continued  to  be  omitted  because  of 
their  pronounced  dissonance  and  their  ease  of  recognition  when 
compared  with  other  intervals.  For  a  similar  reason  the  major 
seventh  was  not  paired  with  the  small  intervals  of  the  major  and 
minor  third. 

A  total  of  twenty-four  paired  intervals  was  therefore  used, 
which,  when  reversed  with  respect  to  basic  and  contrasted  intervals, 
made  forty-eight  types  of  interval  patterns  available  for  item 
construction. 

Paired  intervals  from  the  mixed  fusion  category  are : 

Minor  6th  and  minor  3rd 
Minor  6th  and  minor  7th 
Minor  6th  and  diminished  5th 
Minor  3rd  and  minor  7tli 
Minor  3rd  and  diminished  5th 
Minor  7th  and  diminished  5th 

Other  paired  intervals  selected  are : 

Perfect  5th  and  major  6th 
Perfect  5tli  and  perfect  4th 
Perfect  5th  and  minor  3rd 
Perfect  5th  and  minor  7th 
Perfect  5th  and  diminished  5th 
Perfect  5th  and  major  7th 
Major  3rd  and  major  6th 
Major  3rd  and  minor  6th 
Major  3rd  and  minor  7th 
Major  3rd  and  diminished  5th 
Major  6th  and  minor  3rd 
Major  6th  and  diminished  5th 
Major  6th  and  major  7th 
Perfect  4th  and  minor  3rd 
Perfect  4th  and  minor  7th 
Perfect  4th  and  major  7th 
Minor  6th  and  major  7th 
Diminished  5th  and  major  7th 

The  extent  of  the  selection  of  the  twenty-four  pairs  made  it 
appear  reasonable  that  there  was  sufficient  material  to  serve  the 
purposes  for  which  the  experimental  test  was  designed. 
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The  Selection  of  the  Remaining  Factors  in  the 
Experimental  Design 

Answer  Position 

The  position  of  the  contrasted  interval  determined  the  correct 
answer  to  the  item.  Equal  numbers  of  the  four  answer  positions 
of  test  items  were  used,  there  being  twelve  sets  distributed  at 
random. 

Pitch  Direction 

There  were  eight  main  patterns  for  pitch  direction  used  in  the 
presentation  of  four  intervals.  These  were  numbered  for  future 
reference  and  may  be  represented  graphically  in  the  following 
manner : 

•  •  • 

I     •  V  • 

II  •  • 

•  •  • 

•  VI  • 

•  •  VII  •  • 

III    •  •        • 

IV  •  •  •  • 

•  VIII  •  • 

The  practical  application  of  the  patterns  for  pitch  direction  for 
sequences  of  intervals  presented  certain  problems.  This  concept 
of  pitch  direction  implied  that  intervals  have  pitch,  a  concept  not 
very  applicable  if  the  varying  pitch  distances  of  intervals  are  taken 
into  account.  Do  intervals  have  pitch?  One  answer  to  this  ques- 
tion is  found  in  the  study  by  Valentine^  on  the  aesthetic  appreci- 
ation of  musical  intervals.  One  of  the  conclusions  of  the  study 
was  that  for  most  of  the  subjects  observed,  the  apparent  pitch  of 
the  interval  was  judged  approximately  by  the  pitch  of  its  higher 
note,  in  contradiction  to  a  previous  assertion  that  the  lower  note 
functioned  in  this  capacity.  In  the  light  of  this  finding  the  arrange- 
ment of  intervals  according  to  pitch  direction  was  made  by  apply- 
ing the  pattern  of  pitch  direction  to  the  upper  notes  of  intervals. 

In  order  to  reduce  the  operation  of  pitch  memory,  repetition  of 
identical  intervals  in  any  one  item  was  avoided.    The  placing  of  the 


5  Valentine,  C.  "W.,  op.  cit.,  p.  215. 
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contrasted  interval  was  also  made  in  such  a  way  as  to  avoid  all 
noticeable  effects  of  chord  resolution,  and  any  known  resemblance 
of  note  sequences  to  existing  melodies  was  eliminated.  These  adjust- 
ments were  made  possible  by  the  somewhat  flexible  nature  of  the 
plan  for  pitch  range,  a  factor  which  will  now  be  discussed. 

Pitch  Range 

The  effect  of  varying  pitch  range  of  intervals  in  items  was  not 
known.    Three  categories  of  spacing  were  provided,  as  follows : 

Narrow  spacing — 1  to  2  half  steps 
Medium  spacing — 3  to  5  half  steps 
Wide  spacing     — 6  to  7  half  steps 

The  application  of  the  plan  for  pitch  spacing  also  required  a  con- 
cept of  an  approximate  pitch  for  an  interval.  The  upper  note  of 
the  interval  was  again  taken  to  represent  its  approximate  pitch,  and 
spacing  was  based  on  the  range  encompassed  by  these  upper  notes. 

Miscellaneous  Details 

The  plan,  already  discussed,  for  the  mechanical  distribution  of 
systemized  factors  was  then  carried  out.  All  three  test  forms  used 
the  twenty-four  pairs  of  intervals  but  in  a  slightly  different  manner. 
In  Form  1  the  first  interval  of  each  pair  served  once  as  a  basic  inter- 
val and  once  as  a  contrasted  interval,  making  a  total  of  forty-eight 
arrangements.  In  Form  2  each  pair  was  examined  so  that  in  the 
selection  of  intervals  used  as  basic  there  was  an  equitable  distribu- 
tion of  consonance  and  dissonance  qualities  throughout  the  test 
which  avoided  any  accidental  concentration  of  dissonant  intervals. 
Each  adjusted  pair  was  then  included  twice  throughout  the  test. 
In  Form  3  the  function  of  the  intervals  in  each  pair  was  reversed 
over  their  use  in  Form  2.  An  interval  appearing  as  basic  in  any 
item  in  Form  2  became  a  contrasted  interval  in  a  corresponding  item 
in  Form  3.  These  reversed  arrangements  appeared  twice  through- 
out Form  3. 

The  interval  patterns  were  first  written  on  the  large  cards  and 
the  remaining  three  factors,  written  on  smaller  cards,  were  dis- 
tributed over  the  large  cards.  The  forty-eight  items  of  each  test 
form  were  then  ready  to  be  composed.  The  musical  notes  of  each 
item  were  written  down  on  a  musical  staff,  based  on  the  pattern  of 
construction  which  each  item  bore.  The  forms  were  labeled  Ti,  T2, 
and  T3,  respectively,  and  were  now  ready  to  administer. 
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The  Administration  of  the  Experimental  Tests 
The  Decision  as  to  Sampling 

The  final  test  developed  in  the  study  was  intended  for  use  pri- 
marily in  testing  musical  ability  likely  to  be  encountered  from  the 
sixth  grade  to  the  twelfth  grade.  While  it  is  true  that  some  of  the 
most  significant  relationships  of  the  study  were  established  by  col- 
lege students  taking  this  final  test,  the  intention  was  first  to  measure 
ability  in  the  public  schools  but  to  retain  enough  difficult  numbers 
so  that,  if  possible,  the  test  might  be  used  outside  this  academic 
range. 

A  plan  was  made  for  sampling  various  levels  of  ability  which 
together  might  constitute  a  graded  distribution  of  musical  ability 
for  these  school  divisions.  A  sampling  based  on  academic  grades 
alone  would  not  quite  serve  the  purpose  except  for  grades  five  to 
nine  where  music  is  a  required  subject  in  most  school  systems. 

The  senior  high  school  presented  a  somewhat  different  situation, 
however.  In  these  grades  music  is  an  elective  subject  and  the  activi- 
ties carried  on  are  the  somewhat  specialized  activities  of  chorus, 
band,  orchestra,  and  classes  in  music  theory.  Since  students  not 
enrolled  in  musical  activities  in  the  senior  high  school  were  not 
receiving  additional  school  experience  or  training  in  music  through 
organized  instruction,  it  was  felt  that  their  scores  as  a  group  would 
not  contribute  much  more  information  than  could  be  obtained  from 
the  testing  in  the  junior  high  school. 

Students  enrolled  in  musical  activities  in  the  senior  high  school, 
however,  could  furnish  information  which  was  urgently  needed. 
These  students  were  a  select  group  and  were  also  presumably  improv- 
ing their  abilities  through  their  experience  and  instruction  in  music. 
For  this  reason  only  music  groups  in  the  senior  high  school  were 
tested. 

The  final  selection  of  pupils  assumed  to  provide  a  distribution  of 
musical  abilities  from  grades  five  to  twelve  could  be  only  subjective 
at  best.  The  groups  used,  together  with  their  numbers,  are  as  fol- 
lows: 

N 

Elementary  schools  (unselected  groups)  grades  5  to  6  288 

Junior  high  schools  (unselected  groups)   grades  7  to  9  271 

Senior  high  schools  (selected  groups)  grades  10  to  12  309 

868 

Some  attempt  was  made  to  provide  for  an  equal  distribution  of 
each  of  the  test  forms  throughout  the  three  student  classifications. 
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The  availability  of  student  groups  in  equal  proportions  was  almost 
an  impossibility,  unless  of  course  certain  test  papers  were  eliminated 
from  the  totals.  This  would  have  had  the  effect  of  cutting  down  on 
the  number  of  cases  available  for  study  of  item  validity.  Since 
the  actual  distribution  of  musical  ability  throughout  the  three  aca- 
demic classifications  was  a  comparatively  unknown  factor,  all  avail- 
able papers  from  these  categories  were  retained.  The  distribution 
of  tests  for  the  different  classifications  is  as  follows : 

N  Percent 

Form  1     Elementary  schools  '. 125  43 

Junior  high  schools  102  35 

Senior  music  groups  66  22 

293  100 

Form  2     Elementary  schools  106  43 

Junior  high  schools  81  33 

Senior  music  groups  '. 58  24 

245  100 

Form  3     Elementary  schools  112  34 

Junior  high  schools  101  31 

Senior  music  groups  117  35 

330  100 

It  should  be  noted  that  the  proportions  of  the  three  tests  allo- 
cated to  the  corresponding  academic  divisions  are  not  very  different. 
An  attempt  was  also  made  to  have  each  test  form  given  in  several 
schools  of  the  different  academic  classifications  in  order  to  give  some 
stability  to  the  results. 

Tabulation  of  Scores 

Scores  for  these  test  forms  are  presented  in  Table  IV. 

TABLE  IV 

Distribution  of  Scores  on  Tests  Tj,  T2,  and  T3 

(Score  equals  total  wrong  out  of  48  items) 


N 

Mean 

SD 

Mange 

Ti 

293 

26.6 

7.3 

4-42 

T2 

245 

.    27.0 

6.5 

2-39 

T3 

330 

27.7 

5.8 

8-40 

Totals  

868 

27.1 

6.6 

2^2 

In  view  of  the  closeness  of  the  means  and  the  nature  of  the  ran- 
dom distribution  of  the  systemized  factors  throughout  the  experi- 
mental design,  the  three  separate  forms  were,  for  practical  purposes, 
considered  comparable.  The  discrepancy  in  the  standard  deviations 
among  the  tests  could  have  come  from  differences  in  the  tests,  but  it 
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seemed  more  likely  that  they  resulted  from  differences  in  the  schools 
themselves.  For  the  study  of  items  and  factors  that  data  from  all 
three  tests  were  therefore  combined. 

A  Study  of  the  Results  of  Bi-serial  Correlations 

Ranking  of  Items  According  to  Validity 

Item  validities  were  computed  for  each  of  the  144  items  of  the 
three  experimental  forms.  These  items  have  been  tabulated  and 
ranked  according  to  their  validities.*^  In  the  table  are  also  found 
the  specifications  for  their  construction,  together  with  the  percent 
of  error  for  each.  This  table  was  referred  to  in  the  selection  of 
individual  items  for  the  final  test.  For  all  144  items  the  distribu- 
tion of  bi-serial  values  ranged  from  +  .994  to  +  .133.  In  these  data 
it  seemed  interesting  to  note  the  presence  of  the  minor  seventh,  a 
mixed  fusion  interval,  in  each  of  the  three  highest  ranking  items. 
This  appeared  to  give  weight  to  an  earlier  belief  that  intervals  of 
the  mixed  fusion  group  had  more  than  a  passing  relationship  to  the 
validities  of  items  in  which  they  appeared.  Since  it  is  not  the  pur- 
pose of  this  study  to  develop  this  phase  of  the  data,  mention  of  this 
condition  is  simply  a  matter  of  note. 

Computing  Validity  Indices  for  Subgroups 

The  method  previously  adopted^  of  averaging  the  values  of 
bi-serial  coefficients  for  the  purpose  of  establishing  validity  indices 
was  used  again  in  studying  the  factors  of  the  experimental  design. 
These  values  when  ranked  provided  a  simple  convenient  means  for 
taking  advantage  of  the  apparent  superiority  of  certain  factors  of 
item  construction  over  others.  The  use  of  this  method  of  selection 
did  not  rest  on  proven  statistical  evidence  of  differences  in  validity 
between  the  various  subfactors.  Nevertheless,  the  use  of  these  tech- 
niques of  selection  did  proceed  as  though  there  were  differences,  for 
the  sensible  thing  to  do  seemed  to  be  to  draw  most  heavily  upon  those 
items  and  categories  for  which  the  observed  validity  values  were 
highest. 

The  Factor  of  Paired  Interval  Combinations 

Validity  indices  were  computed  for  types  of  interval  combina- 
tions and  are  presented  in  Table  V.  Each  value,  listed  in  order  of 
size,  represents  the  combined  data  from  six  items.     It  should  again 


6  A  copy  of  this  detailed  material  has  been  deposited  in  the  Psychology 
Library,  Columbia  University. 

7  See  p.  46. 
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TABLE  V 

Validity  Indices  of  Paired  Interval  Comparisons  of 

Items  in  Tests  Ti,  To,  and  T3 

(Total  Cases  =  868) 


Paired 

Validity 

Intervals* 

Index\ 

Dim  5  -  Mi  7  

.635 

Mi   7 -Ma  3   

.630 

Mi   7 -Mi   3   

.630 

Ma  7 -Mi  6  

.616 

Ma  7 -Ma  6  

.614 

Ma  7  -  Dim  5  

.581 

Mi  7-P  4  

.560 

Ma  7  -  P  5  

.560 

Ma  7 - P  4  

.537 

Mi  7-P  5  

.530 

Mi  3  -  P  5  

.503 

Mi  3 -Ma  6   

.500 

Dim  5  -  Ma  3 

.494 

Ma  6-P  5  

.491 

Dim   5-P   5   

.475 

Dim  5  -  Mi  3  

.475 

Ma  6 -Ma  3  

.470 

Mi  3  -  P  4  

.467 

Mi   7 -Mi   6   

.454 

Dim  5  -  Ma  6  

.431 

Mi  6 -Ma  3   

.429 

Dim  5  -  Mi  6  

.426 

Mi  3 -Mi  6  

.421 

P   4-P    5   

.411 

Mean  Per- 
cent of 
Error 


Differences  Between  Paired 
Intervals 


In  Fusion 
EanTc 


In  Half 
Steps 


65.3 

50.1 
61.1 
55.8 
42.0 
45.8 
46.0 
33.2 
35.5 
52.0 
58.6 
61.4 
59.3 
63.1 
69.1 
61.4 
57.6 
62.6 
57.3 
61.2 
60.5 
68.0 
53.7 
52.1 


1 
3 
1 
4 
6 
1 
3 
8 
5 
6 
5 
3 
6 
2 
7 
2 
1 
2 
2 
5 
3 
3 
1 
3 


4 
6 
7 
3 
2 
5 
5 
4 
6 
3 
4 
6 
2 
2 
1 
3 
5 
2 
2 
3 
4 
2 
5 
2 


*  Each  interval  combination  occurred  three  times  in  one  position  and  three 
times  in  the  reversed  position. 

t  Each  mean  value  is  the  average  of  six  items. 

be  noted  that  the  highest  ranking  interval  combination  contains  two 
intervals  of  Ortmann's  mixed  fusion  category,  while  the  lowest 
ranking  combination  contains  two  intervals  of  the  marked  conso- 
nance category. 

It  must  remain  for  further  research  in  which  all  possible  com- 
binations of  intervals  are  studied  to  determine  whether  these  differ- 
ences are  significant,  and  to  attempt  to  discover  reasons  for  these 
differences  should  they  be  substantiated.  A  promising  suggestion 
for  a  solution  of  this  problem  might  be  found  in  the  nature  of  the 
difference  tones  in  the  intervals  used.  The  difference  tone  of  any 
interval  is  a  combination  tone.  Its  frequency  consists  of  the  differ- 
ence between  the  frequencies  of  both  tones  of  the  intervals.  In  the 
case  of  perfect  intervals  the  difference  tone  harmonizes  with  the 
notes  in  the  interval  and  tends  not  to  be  outstanding.     In  the  case 
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of  mild  dissonances,  such  as  the  diminished  fifth  and  the  minor 
seventh,  the  difference  tone  resembles  neither  of  the  tones  and  clashes 
with  both.  Study  on  this  psychological  problem  might  proceed  on 
the  hypothesis  that,  by  virtue  of  the  high  validities  observable  for 
certain  mild  and  pronounced  dissonance  intervals,  persons  high  on 
the  trait  measured  by  the  test  hear  prominent  difference  tones  in 
addition  to  the  quality  of  the  intervals  sounded,  and  are  thus  aided 
in  making  correct  choices  on  many  of  these  items.  Table  V  was 
used  as  another  reference  in  the  construction  of  items  for  the  final 
test  form. 

The  Factors  of  Pitch  Direction  and  Answer  Position 

Validity  indices  of  items  grouped  according  to  pitch  direction 
were  first  computed  for  the  eight  classes  in  this  category.  In  an 
examination  of  these  data  there  seemed  to  be  a  wide  spread  of  values 
for  the  different  answer  positions  on  each  of  the  pitch  directions. 
The  nature  and  extent  of  the  data,  as  will  be  sho-wn  presently,  did 
not  seem  to  warrant  a  statistical  treatment  to  determine  if  the  inter- 
action of  pitch  direction  and  answer  position  were  significant.  It 
was  recognized,  however,  that  advantage  should  be  taken  of  the 
superiority  which  might  seem  to  exist  for  certain  answer  positions 
of  each  of  the  eight  pitch  directions  over  other  answers.  Put  an- 
other way,  it  seemed  more  advisable  when  a  certain  pitch  direction 
was  to  be  chosen  for  use  in  an  item  that  the  answer  position  showing 
the  highest  validity  index  should  be  used.  In  this  sense,  again,  the 
data  were  treated  as  though  there  were  significant  differences  pres- 
ent, and  the  values  which  appeared  highest  were  chosen  more  often 
in  the  construction  of  the  final  test. 

This  process  of  shifting  from  a  study  of  eight  classes  of  pitch 
direction  to  thirty-two  subclasses  threatened  to  thin  out  some  por- 
tions of  the  data.  Some  of  the  subclasses  were  left  with  few  items. 
In  setting  up  the  original  design  no  provision  had  been  made  for 
studying  the  data  in  this  way,  nor  was  it  foreseen  that  this  situation 
would  arise.  Where  subclasses  contained  smaller  numbers  of  cases, 
it  made  for  more  unreliability  of  the  validity  indices.  Since  it  was 
realized  that  the  other  factors  of  interval  combination  and  pitch 
spacing  were  likely  to  be  influencing  the  results,  the  unreliability 
of  these  values  based  on  smaller  numbers  became  serious. 

More  items  were  needed  for  these  cases,  and  a  search  was  made 
through  data  on  item  validity  computed  for  the  Q3  and  Ri  tests. 
An  examination  of  the  pitch  directions  of  the  items  used  in  these 
tests  revealed  many  which  conformed  to  the  patterns  of  those  used 
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in  the  experimental  test  design.  The  reliability  coefficient  of  the 
Qs  test  was  +  .72*  using  junior  high  school  pupils,  and  it  was  felt  to 
be  high  enough  to  justify  the  use  of  test  items  on  that  basis.  The 
sampling  of  both  tests  included  music  and  noumusic  groups  in  sec- 
ondary schools  in  a  manner  which  made  it  similar  to  the  sampling 
used  in  the  experimental  design.  On  the  basis  of  this  reasoning 
there  appeared  more  justification  in  admitting  selected  items  from 
the  Qs  and  Ri  test  forms  than  in  using  certain  of  the  small  numbers 
of  specific  categories  available  only  from  the  three  T  tests.  Table 
VI  presents  data  from  the  T  tests  augmented  by  selected  material 


TABLE  VI 

Validity  Indices  and  Error  Percentages  of  the  Factors  of  Pitch  Direction 
AND  Answer  Number  from  254  Test  Items  Taken  from  Tests 

Qa,  Ra,  Ti,  T.,,  and  Ta 


Type  of  Pitch 

Answer 

Number 

Validity 

Mean  Per- 
cent of 
Error 

Direction 

Position 

of  Items 

Index 

I 

1 

5 

.498 

.608 

• 

2 

2 

.750 

.763 

, 

3 

7 

.620 

.530 

. 

4 

4 

.325 

.389 

II     . 

1 

7 

.567 

.514 

• 

2 

6 

.533 

.622 

• 

3 

4 

.513 

.600 

• 

4 

2 

.490 

.271 

III 

1 

4 

.463 

.728 

• 

2 

4 

.588 

.655 

•              • 

3 

6 

.478 

.349 

• 

4 

9 

.481 

.393 

IV 

1 

5 

.526 

.553 

• 

2 

10 

.528 

.496 

•            • 

3 

8 

.445 

.347 

• 

4 

5 

.442 

.346 

V 

1 

8 

.540 

.587 

• 

2 

13 

.433 

.406 

•            • 

3 

9 

.547 

.434 

• 

4 

10 

.548 

.500 

VI 

1 

13 

.667 

.750 

• 

2 

8 

.515 

.460 

•              • 

3 

8 

.580 

.449 

• 

4 

8 

.500 

.388 

VII 

1 

11 

.466 

.662 

•              • 

2 

12 

.452 

.553 

,              , 

3 

16 

.525 

.537 

4 

10 

.548 

.433 

VIII 

1 

8 

.443 

.681 

•              • 

2 

9 

.371 

.598 

•              • 

3 

6 

.472 

.344 

4 

17 

.516 

.394 

8  See  p.  38. 
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from  the  Q3  and  Ri  tests.     This  table  constituted  still  another  source 
of  reference  material  for  use  in  later  test  construction. 

The  Factor  of  Pitch  Spacing  Among  Intervals 

The  following  tabulation  shows  validity  values  and  mean  errors 
of  items  grouped  according  to  the  three  degrees  of  spacing  used  in 
the  tests. 

(Half  '^ter)<i\  Validity  Value  Mean  Error 

1  to  2  .488  49.0 

3  to  5  .510  53.7 

6  to  7  .546  68.3 

The  chief  use  for  this  information  was  found  in  the  column  show- 
ing mean  error  of  the  different  degrees  of  spacing,  which  was  of 
value  in  making  certain  adjustments  in  the  difficulty  of  constructed 
items  used  in  the  final  test. 

Short  Experiment  for  Securing  Additional  Easy  Items 

It  has  been  noted  in  Table  IV  that  the  means  of  all  three  experi- 
mental tests  showed  over  fifty  percent  of  the  items  were  in  error. 
There  was  need,  therefore,  for  procuring  a  larger  number  of  easier 
items.  Consideration  was  given  to  the  possibility  of  using  some 
interval  combinations  not  previously  used  which  would  make  for 
easier  discriminations.  A  presumptive  superiority  of  the  validity 
of  intervals  of  the  mixed  fusion  and  dissonance  groups  had  been 
further  strengthened  by  the  appearance  of  more  of  their  numbers 
in  the  top  ranking  combinations  found  in  Table  V.  This  led  to  the 
desirability  of  using  the  dissonant  major  and  minor  seconds  in  some 
way.  This  was  effected  by  placing  the  notes  an  octave  apart,  thus 
making  them  into  major  and  minor  ninths.  These  latter  intervals, 
though  still  dissonant,  were  less  startling  and  disagreeable  than 
when  the  notes  were  in  close  proximity,  but  they  yet  retained  the 
particular  type  of  tone  quality  which  has  appeared  so  valid  for  test 
purposes. 

The  two  intervals  of  the  major  and  minor  ninths  were  then 
paired  with  some  of  the  mild  dissonance  intervals  and  also  with 
some  of  the  consonances.  The  new  combinations  were  made  up  into 
multiple-response  items  using  pitch  direction  and  answer  positions 
selected  from  Table  VI.  The  test  was  labeled  the  E  (Easy)  test.^ 
Paired  intervals  selected  for  use  in  this  test  were : 


9  A  copy  of  this  detailed  material  has  been  deposited  in  the  Psychology 
Library,  Columbia  University. 
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Minor  6th  and  minor  9th 
Major  6th  and  major  9th 
Major  6th  and  minor  9th 
Minor  9th  and  minor  7th 
Minor  7th  and  major  9th 
Perfect  5th  and  major  9th 
Perfect  4th  and  major  9th 

The  test  was  given  to  sixty-four  children  from  the  sixth  and 
seventh  grades.  Item  validity  was  computed  by  comparing  the  dif- 
ferences between  the  upper  and  lower  twenty-seven  percent  of  the 
cases.     Data  from  this  experiment  was  used  in  later  test  building. 

Construction  of  the  Final  Instrument  of  Measurement 

Extrinsic  and  Intrinsic  Considerations 

The  building  of  the  final  test  depended  upon  meeting  the  require- 
ments of  certain  extrinsic  as  well  as  intrinsic  needs  of  the  test.  Ex- 
trinsic needs  had  to  do  with  the  manner  of  introducing  and  present- 
ing the  test  material  itself.  Intrinsic  needs  have  largely  been  dis- 
cussed in  the  sections  devoted  to  validity  of  items  and  factors.  The 
construction  of  the  test  proceeded  on  the  basis  of  the  following  con- 
siderations and  in  the  order  named  : 

1.  Directions,   score  sheets,   and  other   introductory 

materials. 

2.  Length  of  the  test  and  random  distribution  of 

answers. 

3.  Distribution  of  items  according  to  difficulty. 

4.  Selection  of  paired  intervals. 

5.  Pitch  direction,  answer,  and  spacing  factors. 

6.  Recording  the  test. 

The  assembling  of  items  and  the  construction  of  additional  items 
proceeded  chiefly  through  use  of  data  found  in  the  reference  ma- 
terial already  discussed  in  this  chapter. 

The  account  of  the  completion  of  the  final  test  is  presented  with 
as  much  attention  to  detail  as  possible.  All  items  and  other  ele- 
ments of  construction  are  identified  so  that  the  reader,  if  he  wishes, 
may  follow  any  phase  of  the  work  undertaken.  The  pattern  of  the 
test.  Form  4  of  the  T  Series,  is  found  in  Table  VII. 

Directions  and  Other  Introductory  Material 

Since  the  method  of  discrimination  called  for  in  the  test  was 
unlike  anything  found  in  music  experience  or  instruction,  ample  ex- 
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TABLE  VII 

Detailed  Patterns  of  Items  Used  in  Form  4  of  the  T  Series 


Test 
Items 

Answer 

Pitch 
Direc- 
tion* 

Pitch 

Spacing 

Interval 
Combination 

Percent  of 
Error  \ 

Source  of 
ItemX 

1     a 

4 

IV 

1-2 

Ma  3 -Ma  7 

Form  E, 

la 

b 

3 

VIII 

1-2 

Ma  6 -Ma  9 

Form  E, 

Ic 

c 

1 

II 

1-2 

P  4 -Ma  9 

C 

d 

2 

V 

1-2 

Mi  3 -Ma  7 

C 

e 

4 

V 

3-5 

Mi  7 -Mi  9 

c 

f 

3 

V 

1-2 

,    P  5-Mi  7 

c 

g 

4 

II 

1-2 

Mi  6 -Ma  7 

15 

Form  3, 

le 

h 

1 

VI 

1-2 

Mi  9 -Ma  6 

C 

i 

3 

I 

1-2 

P  4 -Mi  7 

C 

J 

2 

III 

1-2 

Mi  7 -Ma  9 

c 

2     a 

3 

I 

1-2 

Ma  3 -Mi  7 

26 

Form  3, 

Id 

b 

1 

IV 

3-5 

Ma  7 -Mi  6 

41 

Form  3, 

5b 

c 

4 

VI 

3-5 

P  5-Mi  7 

C 

d 

3 

I 

3-5 

Ma  7 -Mi  6 

c 

e 

1 

II 

1-2 

P  4 -Mi  7 

iss 

Form  2, 

Ig 

f 

2 

II 

6-7 

Mi  6 -Ma  7 

66 

Form  1, 

4f 

g 

4 

VII 

3-5 

Ma  7 -Ma  6 

62 

Form  1, 

5c 

h 

2 

I 

3-5 

Ma  3 -Dim  5 

74 

Form  2, 

3j 

i 

3 

I 

3-5 

P  5 -Ma  6 

C 

J 

1 

II 

6-7 

Mi  3 -Mi  7 

66 

Form  1, 

Id 

3     a 

1 

IV 

1-2 

Mi  6 -Mi  9 

C 

b 

3 

VI 

6-7 

P  4 -Mi  3 

81 

Form  1, 

51i 

c 

2 

II 

1-2 

Ma  7-P  4 

32 

Form  1, 

lb 

d 

4 

V 

3-5 

Mi  7 -Mi  9 

C 

e 

2 

VII 

3-5 

Ma  3 -Ma  6 

79 

Form  2, 

4i 

f 

4 

V 

6-7 

Ma  7 -Dim  5 

65 

Form  2, 

4b 

g 

3 

I 

3-5 

Dim  5 -Ma  7 

C 

h 

1 

VI 

6-7 

P  4-P  5 

93 

Form  1, 

5b 

i 

4 

V 

3-5 

Ma  3 -Mi  7 

C 

J 

2 

III 

3-5 

Mi  7 -Dim  5 

C 

4    a 

3 

I 

3-5 

Ma  6 -Ma  7 

c 

b 

2 

III 

3-5 

Mi  7 -Ma  3 

73 

Form  2, 

lb 

c 

3 

I 

3-5 

Dim  5-P  5 

C 

d 

1 

VI 

3-5 

P  5-Mi  3 

78 

Form  2, 

2a 

e 

2 

I 

6-7 

Mi  7-P  5 

79 

Form  1, 

2a 

f 

4 

VII 

3-5 

P  5 -Ma  9 

C 

g 

1 

VI 

6-7 

Mi  7 -Dim  5 

86 

Form  2, 

5c 

h 

3 

I 

6-7 

Ma  3 -Mi  7 

61 

Form  1, 

Ig 

i 

1 

VI 

6-7 

Mi  7 -Mi  3 

82 

Form  2, 

2f 

J 

4 

VIII 

1-2 

Ma  7 -Ma  6 

62 

Form  3, 

3j 

5     a 

4 

VII 

3-5 

Mi  3 -Ma  7 

C 

b 

2 

III 

3-5 

Mi  7 -Ma  9 

c 

c 

1 

VI 

6-7 

Ma  3 -Dim  5 

c 

d 

2 

III 

3-5 

P  4 -Mi  7 

c 

e 

4 

VI 

3-5 

P  5 -Ma  7 

25 

Form  1, 

la 

f 

1 

VI 

6-7 

Mi  7 -Dim  5 

86 

Form  2, 

2j 

g 

3 

I 

1-2 

Ma  3 -Mi  7 

26 

Form  3, 

Id 

h 

2 

IV 

1-2 

Ma  7 -Mi  6 

51 

Form  2, 

5b 

i 

3 

I 

3-5 

Mi  6 -Mi  9 

C 

J 

1 

IV 

3-5 

Ma  7 -Mi  6 

C 

*  Eoman  numeral  indicates  pitch  direction  in  classification.     See  p.  50. 
t  Percent  of  error  given  when  known. 

X  Items  taken  from  Forms  1,  2,  and  3  of  the  T  series.     C  indicates  item  was 
subsequently  constructed. 
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planation  and  musical  examples  were  necessary,  and  considerable 
attention  was  paid  to  orienting  the  listener  to  the  type  of  situation 
which  the  test  presented. 

In  the  directions  of  the  final  test,  listening:  attention  of  the 
hearer  is  first  directed  to  similarities  in  interval  quality.  This  is 
done  by  sounding  four  similar  intervals  but  on  different  pitch  levels. 
The  listener  is  told  that  intervals  can  also  differ  in  quality,  where- 
upon four  more  intervals  are  sounded,  one  of  which  is  different  in 
quality  from  the  other  three.  The  specific  requirements  of  the  test, 
as  well  as  the  manner  of  recording  answers,  are  then  stated,  and  the 
directions  culminate  in  the  presentation  of  a  number  of  practice 
exercises. 

Length  of  the  Test  and  the  Distribution  of  Answer  Positions 

The  length  of  the  test  is  set  at  fifty  items.  In  the  test  each  item 
is  repeated  for  the  convenience  of  the  listener  to  insure  that  full  at- 
tention has  been  secured  on  each  item.  The  complete  test,  together 
with  the  necessary  directions,  examples,  and  practice  exercises  can 
be  administered  within  a  class  period  of  forty-five  minutes.  Actual 
time  depends  usually  upon  the  ability  of  the  group  to  understand 
and  follow  directions  and  to  concentrate  upon  the  materials  of  the 
test  itself. 

In  the  building  of  the  test  the  distribution  of  the  four  answer 
positions  was  first  made.  Approximately  equal  numbers  of  each 
of  the  four  answer  positions  were  arranged  in  random  order.  This 
order  then  determined  the  sequence  of  answer  numbers  for  the  test. 
Items  were  arranged  in  five  groups  of  ten  each.  The  number  of 
answers  for  each  of  the  four  positions  are  as  follows : 

Answer  Position  Number  Used 

1  13 

2  12 

3  13 

4  12 

50 
Distribution  of  Items  According  to  Difficulty 

Provision  for  the  difficulty  of  individual  items  and  their  distri- 
bution on  this  basis  throughout  the  test  had  to  be  made  despite  the 
handicap  of  incomplete  knowledge  of  this  factor.  This  limitation 
resulted  from  an  anticipated  lack  of  items  of  varying  difficulty  in 
the  experimental  tests.  The  manner  in  which  estimates  of  difficulty 
of  constructed  items  were  made  is  presented  later  in  this  section. 


Item  Numbers 

1 

to 

5 

6 

to 

25 

26 

to 

40 

41 

to 

50 
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For  the  test  as  a  whole  a  mean  error  of  from  twenty  to  twenty- 
five  items  out  of  fiity  was  held  desirable,  based  on  the  errors  made 
by  the  particular  sampling  of  school  population  used  in  the  first 
three  T  tests.  An  average  error  of  about  forty  percent  was  set, 
therefore,  for  the  fifty  items  of  the  test. 

Items  were  not  to  be  arranged  in  order  of  progressive  difficulty 
because  of  a  possible  discouraging  effect  it  might  have  on  some 
persons.  The  test  was  begun  with  some  very  easy  items  and  was 
concluded  with  a  group  of  items  of  moderate  difficulty.  In  the 
main  body  of  the  test  difficult  items  were  interspersed  with  those 
not  so  difficult.  A  general  plan  was  set  up  which  was  adhered  to  as 
much  as  possible  under  the  circumstances.  The  arrangement  was  as 
follows : 

Percent  of  Approxi- 
mate Range  of  Error 

5  to     8 

10  to  75 

25  to  90 

15  to  50 

The  only  items  on  which  difficulty  was  known  were  those  taken 
directly  from  the  three  experimental  tests.  On  all  constructed 
items  a  rough  approximation  of  difficulty  had  to  be  made  by  taking 
into  consideration  the  error  values  of  each  of  the  subclasses  used. 
These  approximations  are  not  reported  because  of  their  subjec- 
tivity. Items,  rated  in  this  manner,  were  placed  in  the  test  on  the 
basis  of  the  plan  for  the  distribution  of  difficulty  just  presented. 

Pitch  Direction,  Answer,  and  Spacing  Factors 

Table  VI,  listing  validity  indices  of  pitch  direction  and  answer 
combinations,  served  as  a  reference  in  the  selection  of  patterns  of 
interval  arrangement  for  newly  constructed  items.  After  the  se- 
lected items  from  the  experimental  T  tests  had  been  distributed 
throughout  the  test,  items  yet  to  be  constructed  were  given  answer 
numbers  corresponding  to  the  random  distribution  of  answers  which 
had  previously  been  set  up  for  the  test.  Items  requiring  certain 
answer  positions  were  assigned  one  of  several  pitch  directions 
showing  high  values  for  validity.  The  use  of  the  various  degrees 
of  pitch  spacing  helped  in  the  final  approximation  of  the  difficulty 
of  each  item  constructed.  An  aid  in  this  work  was  the  tabulation 
of  errors^°  on  items  grouped  according  to  these  three  classifications. 

The  selection  of  interval  combinations,  in  reality,  was  a  matter 
which  meant  the  furnishing  of  basic  tonal  material  for  the  test.    The 

10  See  p.  58. 
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underlying  principle  in  this  selection  called  for  the  provision  of  a 
basic  range  of  intervalic  material  which  at  the  same  time  allowed  for 
a  concentration  for  paired  intervals  ranked  high  in  validity  value 
according  to  Table  V. 

A  plan  drawn  up  set  aside  seventy-five  percent  of  all  items  used 
in  the  new  test  for  the  presentation  of  pairs  used  in  the  experimental 
test,  which  intervals  are  found  in  Table  V.  Slightly  more  than 
one-half  of  these  items  so  set  aside  presented  combinations  from  the 
top  twenty-five  percent  of  all  pairs  ranked  in  this  table.  The  re- 
mainder of  these  items  presented  pairs  from  the  lower  seventy-five 
percent  of  the  table.  Selected  pairs  from  the  E  test,  presenting 
major  and  minor  ninths,  were  used  in  the  remaining  twenty-five 
percent  of  the  items  of  the  entire  test.  In  tabular  form  this  plan 
of  selection  appears  as  follows : 

„  ^  Tir  ^     •  7  Numier  Percent  of  All 

Source  of  Material  ^^  j^^^^  j^^^^  .^  ^,^^  y^^^ 

Top  25  percent  of  Table  21  )  ^^ 

Lower  75  percent  of  Table  16  )  

Selections  from  the  E  Test  13  25 

50 

A  consideration  referred  to  once  before  in  this  study  has  to  do 
with  the  general  distribution  of  consonance-dissonance  feeling  for 
the  tonal  material  presented.  This  balance  was  checked  in  a  general 
way  in  order  that  no  undue  accumulation  of  either  consonant  or 
dissonant  intervals  would  take  place  through  neglect  of  this  factor. 

Recording  the  Test 

All  preliminary  tests  had  been  administered  on  a  small  reed 
organ.  In  order  to  render  uniform  the  giving  of  the  test,  a  re- 
cording was  made  which  included  spoken  directions,  practice  exer- 
cises, and  other  introductory  material.  The  recording  was  made 
using  for  the  sound  stimulus  the  notes  of  an  electric  organ.  The 
entire  test  with  directions  is  presented  on  three  ten-inch  records, 
using  both  sides.  Directions  and  other  introductory  material  take 
one  complete  side  of  a  disc.  Each  remaining  side  accommodates  ten 
items.  Each  item  is  repeated  for  the  assurance  of  the  hearer  in 
checking  first  judgments.  This  device,  although  it  consumed  more 
time,  seemed  to  contribute  to  the  confidence  of  those  taking  the  test. 

The  chapter  which  follows  presents  the  results  of  the  T4  test, 
compared  with  the  various  criteria  which  were  assembled  for  the 
purpose  of  validation. 


CHAPTER  IV 

A  STUDY  OF  THE  PERFORMANCE  OF  THE  T4  TEST 

This  chapter  presents  statistical  data  for  the  study  of  some  of  the 
relationships  which  have  been  the  concern  of  this  investigation.  All 
relationships  have  to  do,  in  one  manner  or  another,  with  the  per- 
formance of  the  T4  test — its  reliability,  its  validity,  its  power  to 
differentiate  between  groups  on  the  basis  of  musical  ability,  and  its 
relationship  to  certain  standardized  music  tests. 

Topics  Contributing  toward  a  Better  Understanding 

OF  THE  Data 

In  order  that  the  studies  which  follow  may  be  more  fully  under- 
stood, a  discussion  is  presented  of  a  number  of  topics,  each  of  which 
bears  a  definite  relationship  to  certain  aspects  of  the  subjects 
studied.  These  topics  are  concerned  with  the  status  of  the  T4  test 
itself,  a  review  of  an  important  validation  study  of  the  Seashore 
tests,  and  an  evaluation  and  description  of  types  of  criteria  con- 
sidered in  the  validation  studies. 

Status  of  the  T4,  Test  iri  Relation  to  the  Study  as  a  Whole 
The  T4  test  form  throughout  this  chapter  and  the  next  should  be 
regarded  as  an  experimental  though  final  instrument  of  measure- 
ment of  the  study,  designed  for  the  purpose  of  carrying  out  several 
aspects  of  exploration  in  the  area  encompassed  by  the  investigation. 
It  constitutes  the  instrument  of  measurement  by  which  the  validity 
of  the  use  of  interval  discrimination  as  an  index  of  musical  ability 
is  examined.  It  also  constitutes  the  means  whereby  differences  in 
score  on  the  function  of  interval  discrimination  can  be  ascertained 
among  certain  groups  considered  high  and  low  with  respect  to  vari- 
ous criteria  of  musical  ability. 

The  T4  test,  therefore,  should  not  be  considered  a  standardized 
test  for  specific  levels  of  ability  even  though  the  data  for  its  con- 
struction were  taken  from  a  certain  sampling  of  school  population 
from  grades  five  to  twelve.  The  purpose  of  the  test  was  to  provide 
an  instrument  which  would  at  least  be  able  to  measure  students  from 
grades  five  to  twelve,  and  one  which  might  prove  useful  for  studying 
groups  outside  of  these  academic  limits.  Hence  this  final  test  must 
be  considered  as  representative  of  a  number  of  forms  which  can  be 
built  upon  the  same  principles  of  construction  and  which  can  be 
standardized  to  meet  the  needs  of  different  ability  groups. 

64 
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A  Validation  Study  of  the  SeasJwre  Test  Battery 

A  report  by  Larson^  describing  certain  phases  of  his  experiences 
in  music  testing  presents  among  other  things  an  account  of  a  valida- 
tion of  the  Seashore  tests  at  the  Eastman  School  of  Music.  Larson's 
article  appeared  at  a  time  when  the  editors  of  the  Music  Educators 
Journal  were  presenting  both  sides  of  a  controversy  over  the  validity 
of  certain  tests  of  music,  chief  of  which  was  the  Seashore  test  bat- 
tery. Larson,  in  the  course  of  his  article,  casts  grave  doubt  on  the 
reliability  and  validity  of  many  of  the  teacher  estimates  which  have 
been  used  by  certain  investigators  to  show  validity  or  a  lack  of 
validity  of  the  music  tests  themselves.  He  intimates  that  many  of 
these  reported  investigations  have  been  conducted  by  persons  with 
no  real  testing  or  research  experience.  He  also  points  out  that  high 
correlations  between  academic  grades  and  tests  are  difficult  to 
secure. 

He  then  relates  a  very  carefully  conducted  and  controlled  valida- 
tion study  at  the  Eastman  School  which  resulted  in  a  correlation  of 
+  .59  between  initial  Seashore  tests  the  first  week  of  the  school  year 
with  a  final  objective  examination  in  a  musical  theory  course  given 
at  the  end  of  the  school  year.  As  an  example  of  the  limited  extent 
to  which  tests  are  related  to  final  grades  in  academic  subjects,  he 
quotes  a  correlation  of  +  .50  on  the  widely  used  American  Council  on 
Education  Psychological  Examination  for  College  Freshmen. 

Elsewhere  in  his  article  Larson  enlarges  upon  his  own  use  of 
grades  in  musical  theory  at  the  Eastman  School  as  criteria  for  test 
validation.    He  says  in  part : 

Several  years  ago  the  writer  .  .  .  was  attracted  by  the  importance  of  the 
Theory  I  course.  The  nature  of  this  course  is  such  that  it  has  been  recognized 
as  a  key  course  in  the  curriculum  because  it  indicates  general  musicality,  and 
any  student  who  has  great  difficulty  with  it  is  considered  a  questionable  student 
for  continuance  in  the  regular  course ;  in  fact,  Theory  I  and  its  sequel,  Theory  II, 
are  absolute  requirements  for  all  students  who  are  granted  the  Bachelor  of 
Music  degree. 

He  places  great  emphasis  upon  the  reliability  of  these  particular 
grades.  He  reports  that  they  were  built  upon  an  elaborate  series  of 
objective  tests  in  theory  developed  at  the  Eastman  School  of  Music, 
the  purpose  of  which  was  the  freeing  of  grades  from  the  customary 
subjectivity  surrounding  teacher  ratings.  In  view  of  the  careful- 
ness with  which  these  criterion  grades  were  issued,  the  results  of  the 
entire  study  must  be  considered  significant.    Furthermore,  Larson 's 

1  Larson,  William  S.  "Practical  Experience  with  Music  Tests."  Music 
Educators  Journal,  Vol.  XXIV,  No.  5,  March,  1938,  pp.  31,  68-73. 
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entire  procedure  lends  substantial  support  to  the  use  of  grades  in 
theory  as  a  fitting  criterion  for  the  validation  of  certain  music  tests, 
especially  when  the  grades  are  based  upon  careful  objective  methods 
of  evaluation. 

His  results  on  the  Seashore  validation  study  may  also  be  given 
some  weight  as  far  as  the  Seashore  tests  are  concerned,  in  view  of  his 
former  status  as  George  Eastman  Research  Fellow  in  the  Psychology 
of  Music  at  the  University  of  Iowa,  the  institution  in  which  Seashore 
carried  out  his  own  research  studies.  His  report  on  this  study  would 
also  appear  to  be  the  one  important  validation  of  this  test  battery 
that  can  be  relied  upon,  since  the  revised  Seashore  ManuaP  offers  no 
validity  coefficients.^ 

The  Concept  of  Tonal  Learning 

The  concept  of  tonal  learning  used  in  one  of  the  validation 
studies  in  this  chapter  is  held  by  Flagg*  to  be  a  basic  factor  in  musi- 
cal growth.  Enlarging  upon  this  concept  of  tonal  learning  she 
explains  :^ 

What  we  are  attempting  to  say  is  that  growth  in  musical  apprehension, 
whether  with  young  children  having  their  first  musical  experiences  or  with  older 
students  approaching  for  the  first  time  a  systematic  organization  of  the  ele- 
ments of  past  musical  experience  .  .  .  such  growth  must  be  primarily  concerned 
with  a  deepening  penetration  and  precision  in  tonal  learning  in  bringing  more 
and  more  complex  musical  organizations  into  awareness,  rooted  always  in  in- 
creasing awareness  of  tonal  relations  and  in  the  internal  demands  set  up  by 
them. 

It  may  be  observed  that  this  basic  factor,  according  to  the  author, 
operates  in  all  musical  experience;  it  must  be  experienced  by  the 
young  child  as  well  as  the  older  student  who  would  engage  in  formal 
study  of  the  elements  of  music.  The  root  of  all  understanding,  she 
concludes,  is  to  be  found  in  a  certain  awareness  of  tonal  relations. 
The  similarity  between  this  latter  concept  and  the  psychology  of 


2  Seashore,  Lewis,  and  Saetveit,  op.  cit.,  p.  17. 

3  The  only  reference  in  the  manual  to  the  validity  of  the  tests  is  the  follow- 
ing statement :  ' '  The  validity  of  these  measures  must  be  interpreted  in  terms 
of  the  extent  to  which  they  function  in  the  actual  musical  situation."  Other 
aspects  of  validity  of  these  tests  are  discussed  in  the  following  monograph: 
Saetveit,  Joseph  G.,  Lewis,  Don,  and  Seashore,  Carl  E.  The  Revision  of  the  Sea- 
shore Measures  of  Musical  Talents.  University  of  Iowa  Press,  Iowa  City,  la., 
1940. 

4  At  the  time  this  investigation  was  carried  on  Miss  Marion  Tlagg  was 
Director  of  Music  at  the  Horace  Mann  School,  Teachers  College,  Columbia 
University. 

5  Flagg,  Marion.  ' '  Tonal  Learning :  The  Basic  Factor  in  Musical 
Growth."     Education,  May,  1939. 
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interval  perception  discussed  in  Chapter  I*'  should  be  a  matter  of 
note  at  this  point. 

Miss  Flagg's  aid  was  obtained  in  securing  her  estimates  on  this 
trait  for  students  taking  the  T4  test.  Specific  modes  of  behavior 
associated  in  actual  classroom  instruction  with  the  general  concept 
of  tonal  learning  were  defined  for  the  purpose  of  this  study  as  ' '  the 
ability  of  the  ear  to  lay  hold  and  to  perceive  tonal  material  based 
upon  (a)  the  ability  to  sing  a  memorized  melody  accurately  and  (b) 
the  ability  to  sing  a  dictated  melody  and  a  difficult  interval. ' ' 

A  comparison  between  Flagg's  concept  of  tonal  learning  and 
Seashore 's  use  of  the  term  tonal  imagery^  gives  rise  to  the  belief  that 
both  concepts  may  in  some  way  refer  to  a  single  though  highly  im- 
portant phase  of  musical  ability. 

The  Beport  of  a  Theory  Committee 

A  significant  statement  on  the  importance  of  theory  study  is 
found  in  a  report  by  the  Theory  Committee^  at  Teachers  College, 
Columbia  University.     The  report  reads  in  part : 

It  (theory)  is  to  the  musician  what  a  thorough  knowledge  of  English  is  to 
the  general  student.  .  .  .  The  knowledge  of  the  structure  of  music  serves  two 
equally  important  objectives:  the  development,  first  of  standards  of  taste  or 
evaluation  and,  second,  of  needed  professional  skills. 

In  another  section  of  the  report  is  a  reference  to  the  contribution 
of  the  study  of  theory  toward  the  ' '  appreciation ' '  of  music. 

...  we  believe  that  the  vital  connection  between  so-called  "theory"  and  "ap- 
preciation ' '  should  be  adequately  stressed.  Both  are  only  two  phases  of  a  larger 
problem:  how  to  make  people  intelligently  aware  and  consequently  more  sensi- 
tive to  music. 

The  first  quotation  should  serve  as  further  confirmation  of  the 
importance  of  the  study  of  theory  in  any  basic  philosophy  of  music 
education,  and  should  further  justify  the  use  of  marks  in  theory  as 
criteria  for  the  validation  of  certain  tests  of  music.  The  latter  quo- 
tation is  important  in  another  sense,  for  it  links  musical  "appreci- 
ation" with  a  knowledge  of  musical  structure  as  presented  in  music 
theory  classes.  Such  a  link  is  an  important  factor  in  the  future 
interpretation  of  the  validity  of  the  T4  test,  for  should  such  valida- 
tion appear  significant  through  the  use  of  criterion  grades  in  theory, 
it  would  not  be  unreasonable  to  interpret  test  scores  in  terms  of  some 


6  See  pp.  18  ff. 
1  See  pp.  21  ff. 

8  Music  Education  Study  Conference  (1939),  Beport  of  Theory  Committee. 
Teachers  College,  Columbia  University.     Unpublished  manuscript. 
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capacity  to  "appreciate"  music,  at  least  appreciation  for  those 
aspects  of  musical  understanding  having  to  do  with  an  awareness 
of  certain  structural  values  of  music. 

The  Problem  of  Securing  Numbers  of  Cases 

The  difficulty  of  securing  reliable  and  valid  teacher  estimates  has 
been  discussed.  Another  aspect  of  this  problem  is  the  necessity  of 
obtaining  these  estimates  in  sufficient  numbers  to  warrant  correla- 
tion studies.  It  was  the  experience  in  this  investigation  that  teacher 
estimates  of  student  ability  tended  to  be  more  or  less  confined  to  a 
comparison  of  students  within  given  classes.  Grades  in  course  work 
in  music  seemed  easiest  to  obtain  when  the  comparisons  were  made 
within  classes.  The  only  exception  was  the  situation  found  in  well- 
organized  conservatories  where  instruction  and  grading  were  more 
or  less  standardized  for  all  classes.  The  smaller  numbers  of  cases 
in  some  of  the  data  reported  in  this  chapter  should  be  viewed  in  the 
light  of  these  conditions.  Although  certain  teacher  estimates  are 
introduced  as  criteria,  the  significance  of  the  objectives  of  instruc- 
tion, the  status  of  institutions  in  which  the  work  was  carried  on,  and 
the  exploratory  nature  of  the  whole  validation  procedure  seem  to 
make  it  worth  while  to  examine  this  type  of  data. 

Scores  on  the  T4  Test 

All  scores  on  the  T4  test,  as  in  previous  tests,  are  stated  in  terms 
of  number  wrong,  consequently  the  lower  the  score  the  better  the 
response  on  the  test.  Test  data,  therefore,  represent  raw  scores, 
untreated  by  any  correction  formula. 

Studies  on  Reliability 

Three  studies  of  the  reliability  of  the  test  are  reported.  The  first 
is  a  split-half  correlation.  The  remaining  two  are  retest  studies,  one 
on  the  elementary  and  junior  high  school  level,  the  other  on  the  col- 
lege level.  These  reliability  coefficients  must  be  considered  as  apply- 
ing only  to  the  specific  situations  where  this  testing  was  done. 

A  split-half  correlation  using  658  test  papers  was  made  up  from 

the  following  student  groups : 

N 

Unselected  students  of  seventh  and  eighth  grades 208 

Music  students  of  high  school  and  college  age 175 

Unselected  students  in  a  teachers  college  275 

658 
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The  two  halves  were  obtained  by  separating  the  odd  from  the 
even  items.  A  correlation  of  +  .72  was  obtained  for  the  two  halves 
of  the  test.  Applying  the  Spearman-Brown  formula,  a  reliability 
value  of  +  .84  was  obtained  for  the  test  as  a  whole. 

Two  retests  were  carried  on  with  more  homogeneous  academic 
groups  than  for  the  split-half  correlation.  The  first  study  of  retest- 
ing  was  made  in  grades  six  to  eight,  where  257  children  were  re- 
tested  within  a  period  of  from  four  to  five  weeks  from  initial  testing. 

Retests  were  also  made  of  167  college  students  over  a  period  of 
from  one  to  five  months.  The  reliability  coefficients  on  retesting  for 
grades  six  to  eight  and  for  college  students  were  +  .74  and  +  .76, 
respectively.  A  tabulation  of  the  two  retest  studies  includes  means 
and  standard  deviations  as  follows : 

N 

Grades  6  to  8  257 

College   students   167 

Studies  on  Validity 

On  the  Secondary  School  Level 

The  first  study  of  validity  to  be  reported  is  a  correlation  between 
scores  on  the  T4  test  and  teacher  estimates  of  tonal  learning.  A  dis- 
cussion of  this  criterion  is  presented  earlier  in  the  chapter.  Pupils 
in  the  seventh,  eighth,  and  ninth  grades  were  tested  and  then  ranked 
on  the  criterion  by  the  instructor,  who  taught  the  classes  herself. 
Each  class  was  ranked  separately.  Correlations,  computed  by  the 
rank-order  method  and  changed  to  product  moment  values,  are  pre- 
sented in  Table  VIII. 


M, 

SD^ 

M^ 

SD, 

r 

27.71 

5.84 

25.78 

6.02 

+  .74 

21.30 

6.05 

20.33 

6.51 

+  .76 

Correlations  of 

TABLE  VIII 

Scores  on  the  T4  Test 
Tonal  Learning 

WITH 

Estimates  of 

•    Grade 

N 

M 

SD 

r 

7 
8 
9 

33 
23 
32 

20.63 
18.21 
18.47 

7.71 
7.56 
7.80 

+  .71 
+  .52 
+  .59 

The  correlations  for  the  seventh  and  ninth  grades  are  significant 
at  the  one  percent  level.  The  value  for  the  eighth  grade  is  signifi- 
cant only  at  the  five  percent  level.  The  instructor  making  these 
ratings  made  no  claim  for  knowing  the  three  classes  equally  well. 
She  also  observed  that  since  tonal  learning  was  an  objective  of  in- 
struction, and  hence  subject  to  improvement,  the  work  of  ratings  was 
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complicated  by  the  necessity  for  estimating  improvements  of  certain 
students  over  abilitj''  shown  earlier  in  the  year. 

Another  study  of  validity  in  a  secondary  school  was  made  at  the 
High  School  of  Music  and  Art,  New  York  City.  The  group  studied 
was  carrying  on  special  work  in  musical  theory  which  compared 
favorably  with  that  carried  on  in  conservatories,  according  to  in- 
structors in  the  school.  Criteria  consisted  of  grades  in  theory  for 
five  semesters,  and  teacher  estimates  of  intrinsic  ability  for  the  work 
in  which  they  were  then  engaged.  Grades  were  given  in  percents, 
and  teacher  estimates  were  given  in  terms  of  a  rank  order  list  of  the 
class.  Statistical  results  for  this  study  are  shown  in  the  following 
tabulation : 

N  M  SD  r 

T4  test  and  composite  grades  for  five  semesters     27         ]0.20         5.30         +.55 
T4  test  and  teacher  estimates  of  ability 27      .  10.20         5.30         +  .39 

The  first  correlation  is  significant  at  the  one  percent  level,  the  second 
at  the  five  percent  level. 

These  two  validation  studies,  with  estimates  of  tonal  learning 
and  status  in  musical  theory,  while  not  unduly  impressive  because 
of  a  lack  of  a  sizeable  number  of  cases,  should  nevertheless  be  noted 
because  of  the  nature  of  the  criteria  and  because  it  is  a  study  carried 
out  with  secondary  school  students. 

On  the  College  and  Conservatory  Level 

Correlations  between  scores  on  the  T4  test  and  various  grades 
issued  by  the  Department  of  Theory  at  the  Juilliard  School  of  Music 
are  next  presented.  Three  prominent  subjects  in  theory  at  this 
institution  are  sight-singing,  musical  dictation,  and  written  theory, 
all  three-3^ear  subjects  at  this  school.  First-year  grades  are  used  as 
criteria,  which,  it  will  be  recalled,  constitute  the  same  grade  level  in 
theory  used  as  criteria  by  Larson  at  the  Eastman  School. 

Two  sections  of  a  large  class  in  musical  history  were  given  the 
T4  test.  Unfortunately  at  the  time,  the  number  of  students  on  whom 
grades  in  theory  could  be  obtained  was  relatively  small.  The  test 
was  given  near  the  close  of  the  first  semester  of  the  school  year,  and 
a  study  made  of  the  grades  available  at  that  time.  For  these  data, 
grades  and  test  scores  were  recorded  at  about  the  same  time. 

At  the  end  of  the  second  semester  it  was  found  that  a  number  of 
students  taking  the  test  earlier  in  the  year  had  completed  work  in 
various  subjects  in  theory.  Data  on  these  students  were  added  to  the 
material  already  assembled.    The  combined  criteria  represent  grades 
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issued  at  different  intervals  from  the  time  of  giving  the  test,  but  in 
the  interest  of  securing  a  greater  number  of  cases  they  were  brought 
together  into  another  study.  Table  IX  presents  the  data  for  both 
groupings. 

These  correlations  are  all  statistically  significant  at  the  one  per- 
cent level.  The  values  in  the  second  set  of  data  may  be  regarded  as 
more  stable,  since  they  are  supported  by  greater  numbers  of  cases. 

TABLE  IX 

Correlations  of  Scores  on  the  T^  Test  with  Grades  in  Theory  at 
the  juilliard  school  of  music 


N 

M* 

SD* 

r 

Grades  issued 
at  the  time  of 
testing 

Grades  earned  at 
various  times 
during  the  year 

T4  test  and  dictation 
T4  test  and  sight-singing 
T4  test  and  written  theory 

T4  test  and  dictation 
T4  test  and  sight-singing 
Hi  test  and  written  theory 

27 
29 
31 

70 
70 
69 

12.38 
12.38 
12.24 

6.42 
6.42 
6.43 

+  .72 
+  .62 

+  .47 

+  .60 
+  .61 
+  .46 

*  Indicates  mean  and  standard  deviation  of  the  T4  test. 

It  may  be  of  interest  also  to  note  the  relationships  existing  be- 
tween marks  in  the  three  courses  in  theory  at  this  institution.  A 
correlation  computed  on  over  one  hundred  cases  resulted  in  the 
following  relationships,  all  of  which  are  statistically  significant  at 
the  one  percent  level : 

N  r 

Sight-singing  and  dictation  109  +  .71 

Sight-singing  and  written  theory  108  +  .28 

Dictation  and  written  theory 109  +  .48 

The  entire  Juilliard  data  show  significant  relationships  between 
music  grades  themselves  and  between  the  different  subjects  and  test 
scores.  It  may  be  noted  that  the  correlation  of  test  scores  and 
grades  in  dictation  show  the  highest  value  for  r,  followed  by  grades 
in  sight-singing  and  written  theory,  respectively.  In  the  relation- 
ships between  grades  in  theorj^,  the  two  subjects  of  sight-singing 
and  dictation  show  the  highest  value  for  r,  while  grades  in  sight- 
singing  and  written  theory  show  a  value  of  only  -f  .28. 

Studies  of  Group  Differences 

A  Study  of  Choral  Groups 

Three  studies  were  made  to  determine  if  choral  groups  differed 
significantly  en  the  T4  test  from  other  groups  in  the  same  schools 
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not  engaged  in  choral  activity.  In  all  three  instances  the  testing 
was  sufficiently  extensive  so  that,  with  the  exception  of  a  small 
number  of  cases,  the  choral  groups  were  compared  with  the  re- 
mainder of  the  student  body  with  which  the  choral  group  was 
academically  identified.  Two  studies  were  made  in  junior  high 
schools,  and  one  in  a  college.  The  results  of  these  tests  are  reported 
in  Table  X.  School  number  1  was  the  Junior  High  School  at 
Virginia,  Minnesota.  School  number  2  was  at  Teaneck,  New  Jersey, 
while  the  college  data  were  secured  at  the  New  Jersey  State  Teach- 
ers College  at  Jersey  Cit}^  The  results  in  the  t  column  denote 
significant  differences  at  the  one  percent  level  in  favor  of  the  choral 
groups,  which  show  that  the  choral  groups  differ  significantly  from 
the  nonchoral  groups  on  the  function  measured  by  the  T4  test. 

TABLE  X 

Differences  between  Chokal  and  Nonchoral  Groups  on  the  T4  Test 


Groups  Studied  N         M       SB         D        SEdiff.        t 

Special  chorus,  school  No.   1  48     20.08     6.74 

3.59         1.04         3.45 
Eemaining  students,  school  No.   1     300     23.67     5.96 

Junior  chorus,  school  No.  2  25     21.64     6.07 

5.26         1.32         3.98 
Eemaining  students,   school   No.   2     261     26.90     7.19 

Special  college  chorus  38     17.65     5.87 

5.16         1.04         4.97 
Eemaining  college  students  211     22.81     5.41 


Music  Student  Groups  and  Unselected  College  Students 

Two  studies  are  presented  which  have  for  their  purpose  the 
study  of  differences  between  groups  of  music  students  and  student 
groups  not  specializing  in  music.  The  first  study  compares  seventy- 
three  music  undergraduates  at  the  Juilliard  School  of  Music  with  249 
elementary  teacher  education  students  at  the  State  Teachers  College 
at  Jersey  City.  In  the  latter  group  are  included  the  choral  group 
for  that  institution. 

A  second  study  compares  the  scores  of  a  low  ability  theory  class 
of  forty-three  at  the  High  School  of  Music  and  Art  with  the  same 
249  students  of  the  first  study. 

Table  XI  presents  these  data  and  shows  values  for  t  of  11.20  and 
6.96  for  the  first  and  second  studies,  respectively.  Both  are  highly 
significant.     It  should  be  noted  that  students  in  a  low  ability  class 


9.52  .85       11.20 


5.85  .84         6.96 
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in  theory  on  the  secondary  school  level  are  still  significantly  superior 
to  unselected  college  students  with  respect  to  scores  on  the  T4  test. 
A  further  relationship,  not  reported  in  Table  XI,  indicates  that 
Juilliard  undergraduates  are  reliably  better  than  the  low  ability 
theory  class. 

TABLE  XI 

Differences  between  Music  Students  and  Unselected  College 

Groups  on  the  T4  Test 

Groups  Studied  N  M  SD        D        SEaiff.        t 

Juilliard  undergraduates  73  12.50  6.57 

Unselected    college    students  249  22.02  5.79 

Low  ability  theory  class  43  16.17  4.95 

Unselected  college  students  249  22.02  5.79 

Differences  'between  Music  Student  Groups 

Two  studies  with  music  student  groups  are  next  presented.  The 
first  study  compares  scores  of  two  groups  already  studied  in  other 
respects — the  advanced  theory  class  and  the  low  ability  class  at  the 
High  School  of  Music  and  Art  in  New  York  City.  Both  classes 
contained  students  with  from  five  to  six  semesters  of  theory  study. 
The  advanced  class  was  a  select  group  which  excelled  in  theory  and 
orchestration.  The  low  class  was  made  up  of  students  who  showed 
themselves  poor  in  theory  work  and  for  whom  remedial  instruction 
was  offered.  The  scores  of  both  were  studied  to  determine  whether, 
in  view  of  the  difference  in  status  of  the  two  classes,  they  would 
differ  significantly  on  the  T4  test.  Table  XII  shows  that  for  these 
two  groups  a  t  value  of  4.66,  highly  significant,  is  obtained. 

The  second  study  of  music  student  groups  consists  of  a  compari- 
son of  Juilliard  undergraduate  students  with  graduate  students  in 
music  at  Teachers  College,  Columbia  University.  The  Juilliard 
students  were  working  on  Bachelor  of  Arts  or  Bachelor  of  Science 
degrees,  while  at  Teachers  College  the  group  consisted  of  candi- 
dates for  the  Masters  and  Doctors  degrees.  The  purpose  of  this 
study  was  to  determine  if  the  difference  in  status  of  the  two  groups 
would  show  a  significant  difference  between  the  scores  which  they 
earned  on  the  T4  test.  The  t  value  in  Table  XII  of  2.38,  which  is 
significant  at  the  two  percent  level,  shows  a  difference  in  favor  of  the 
graduate  students  of  music. 
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TABLE  XII 
Differences  between  Music  Student  Groups  on  the  T4  Test 

Groups  Studied  N         M       SB         D        SEdiff.        t 

Advanced  theory   class  27     10.20     5.30 

5.97         1.28         4.66 
Low  theory  class  43     16.17     4.95 

Juilliard  undergraduates  73     12.50     6.57 

3.55         1.47         2.38 
Music  graduate  students  35       8.95     7.33 

The  results  of  the  two  studies  with  music  student  groups  show 
that  significant  differences  may  be  found  between  certain  music 
groups  judged  to  be  different  either  in  ability  or  scholastic  standing 
with  respect  to  music. 

Differences  hetween  Two  Elementary  Schools 

In  one  school  system  all  the  children  of  the  sixth  grades  had 
been  given  the  T4  test.  The  supervisor  of  elementary  school  music 
was  asked  to  designate  the  schools  where,  in  her  opinion,  the  status 
of  musical  participation  and  performance  was  highest  and  lowest, 
regardless  of  the  causes. of  these  differences.  There  were  six  schools 
in  all.  School  A  was  judged  the  superior  school ;  schools  X  and  Y 
were  together  rated  as  poorest.  The  difference  between  the  scores 
of  these  schools  is  significant  at  the  one  percent  level  as  shown  in  the 

D  SEdiff  t 

3.0  1.0  2.0 

Schools  X  and  Y     42  29.0  5.0 

These  data  show  that  with  a  reported  difference  in  the  status  of 
music  in  these  schools  there  is  a  significant  difference  on  the  function 
measured  by  the  T4  test. 

Relationships  with  Standardized  Music  Tests 

Tests  Used 

The  published  tests  used  in  the  correlations  with  the  T4  test  and 
the  forms  employed  are  as  follows : 

The  Knuth  Test  of  Rhythm  and  Melody,  Division  3, 

Form  A^ 
The  Drake  Musical  Memory  Test,  Form  A^° 
The  Seashore  Measures  of  Musical  Talents,  Series  B^^ 


following  tabulation : 

N 

M 

SD 

School  A  55 

26.6 

6.0 

9  Educational  Test  Bureau,  Minneapolis  and  Philadelphia. 

10  Public  School  Publishing  Co.,  Bloomingtoii,  111. 

11  The  E.C.A.  Manufacturing  Company,  Camden,  N.  J. 
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Description  of  the  Tests 

The  Seashore  Measures  of  Musical  Talents  consist  of  recorded 
tests  of  pitch,  loudness,  time,  timbre,  rhythm,  and  tonal  memory. 
Testing  was  limited  to  the  measures  of  pitch,  tonal  memory,  and 
rhj'thm. 

The  test  of  pitch  measures  ability  to  differentiate  between  graded 
differences  in  the  frequency  of  two  tones  sounded  in  sequence.  The 
test  of  musical  memory  measures  the  ability  to  identify  certain  note 
changes  in  a  sequence  of  tones  played  twice.  The  test  of  rhythm 
measures  ability  to  recognize  changes  in  an  abstract  rhythmic 
pattern  sounded  twice.  The  reliability  of  Series  B  for  the  tests  of 
pitch,  rhythm,  and  tonal  memory  is  reported^^  as  +  .78,  +  .72,  and 
+  .89,  respectively. 

The  Knuth  Test  of  Rhythm  and  Melody  is  a  test  purporting  to 
measure  ability  to  recognize  and  comprehend  music  from  its  nota- 
tion. A  four-measure  musical  phrase  is  played  on  the  piano  and 
the  student  is  asked  to  check  from  a  number  of  printed  examples  the 
melodic  excerpt  which  was  played.  An  experimental  tryout  used 
4,208  tests  on  which  material  was  standardized  for  different  aca- 
demic divisions.  The  reliability  for  the  third  division  of  the  test, 
used  in  the  study,  is  reported  in  the  manual  as  .840.  No  validity 
coefficients  are  offered  but  the  manual  reports  that  validation  was 
secured  by  the  "pooled  expert  judgments  of  six  music  supervisors 
and  college  teachers  of  public  school  music"  who  selected  the  ma- 
terial from  some  well-known  school  music  texts. 

The  Drake  Test  of  Musical  Memory  purports  to  measure  musical 
talent,  the  stated  assumption  being  that  musical  memory  is  the 
most  important  and  indispensable  aspect  of  such  talent.  Reliability 
for  college  groups  on  Forms  A  and  B  is  reported  in  the  manual  as 
+  .93  computed  by  the  split-half  method.  For  students  of  grammar 
school,  junior  high  school,  and  senior  high  school,  the  reliability  is 
+  .77,  +  .73,  and  +  .71,  respectively.  The  coefficient  of  validity  ranges 
from  +.499  with  age  and  training  partialed  out,  to  a  raw  validity 
coefficient  of  +  .671.  The  nature  of  the  criterion  for  validity  is 
not  stated. 

Administration  of  the  Tests 

The  tests  were  administered  to  students  at  the  State  Teachers 
College  at  Jersey  City.  Table  XIII  presents  the  available  data  on 
the  relationships  between  scores  on  these  published  tests  and  the  T4 
test. 


12  Saetveit,  Lewis,  and  Seashore,  loc.  cit. 
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Only  a  limited  interpretation  of  the  results  of  these  correlations 
can  be  attempted,  yet  it  seems  appropriate  to  introduce  some  data 
of  this  kind.  With  the  exception  of  the  Seashore  test  for  rhythm, 
all  correlations  are  significant  at  the  one  percent  l-^vel.  The  Knuth 
and  the  Seashore  pitch  tests  seem  equally  related,  although  the  cor- 
relation values  are  not  exceptionally  high.  The  correlation  with 
the  rhythm  test,  on  the  other  hand,  is  not  high  enough  to  be 
significantly  different  from  zero. 

TABLE  XIII 

Correlations  of  Scores  on  the  T^  Test  with  Scores  on  Three 

Standardized  Tests  in  Music 

(Scores  reported  in  terms  of  items  wrong) 

Test  Correlated  with  the  T^  Test  N  r 

Knuth  Test  of  Ehythm  and  Melody  .'....  215  .54 

Drake  Test  of  Musical  Memory  161  .30 

Seashore  Measures: 

a.  Pitch  84  .53 

b.  Memory 103  .42 

c.  Ehythm  83  .17 

These  tests  are  intended  by  their  authors  to  measure  certain 
specific  responses  in  music.  The  Seashore  tests  of  pitch,  rhythm, 
and  tonal  memory  represent  more  or  less  isolated  capacities.  The 
Knuth  and  Drake  tests  are  specific  only  in  the  sense  that  they 
purport  to  measure  higher  units  of  complex  behavior,  such  as  the 
ability  to  recognize  and  comprehend  music  from  its  notation,  and 
general  musical  memory.  A  matter  of  note  is  that  both  tests  include 
melodic  and  rhythmic  patterns  of  response.  The  correlations  of 
+  .54  and  +  .30  for  the  Knuth  and  the  Drake  tests,  therefore,  indi- 
cate the  extent  to  which  the  function  measured  by  the  T4  test  is 
related  to  measures  of  the  other  two  tests  in  the  sampling  used. 
It  is  not  to  be  expected  that  they  would  correlate  to  any  marked 
degree  with  the  T4  test,  for  if  they  did  they  might  be  measuring  the 
same  thing.  The  data  show  that  this  latter  assumption  is  true  only 
in  part. 

There  may  be  some  importance  in  the  fact  that  the  correlation 
between  pitch  and  the  T4  test  is  as  high  as  it  is,  for  the  ability  to 
hear  small  differences  in  pitch  frequency  and  the  perception  of 
differences  in  interval  quality  have  usually  been  regarded  psycho- 
logically as  entirely  different  responses.  It  is  possible  this  is 
merely  an  attention  factor.  This  correlation  warrants  further 
study,  especially  since  the  Seashore  methods  of  testing  have  been 
called  into  so  much  question  in  recent  years. 
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It  is  understandable  that  the  Seashore  rhythm  test  does  not 
correlate  significantly  with  the  T4  test,  for  there  would  appear  to  be 
no  similarity  between  the  perception  of  rhythmic  and  tonal  patterns 
when  both  are  presented  in  isolation.  One  implication  of  this  lack 
of  correlation  would  be  that  a  test  embracing  both  rhythmic  and 
tonal  patterns  could  hardly  be  diagnostic  for  the  separate  functions 
but  must  justify  its  use  on  some  other  basis. 

Summary  of  Findings 

A  general  summary  of  the  results  of  the  studies  on  the  per- 
formance of  the  T4  test  may  be  divided  into  two  divisions :  first, 
studies  made  of  group  differences  and,  second,  studies  of  individual 
differences  in  which  specific  criteria  on  student  ability  are  related 
to  test  scores. 

Studies  of  Group  Differences 

Extent  of  the  function 

The  studies  showing  significant  differences  between  certain 
homogeneous  groups  of  the  investigation  tend  to  indicate  the  pres- 
ence of  a  considerable  spread  of  the  function  of  interval  discrimina- 
tion in  a  range  of  ability  from  unselected  students  in  the  sixth 
grade  to  students  of  music  in  a  graduate  school.  These  studies  of 
differences  show  that  the  materials  of  the  T4  test  are  capable  of 
measuring  an  impressive  distribution  of  ability  regardless  of  the 
initial  sampling  in  the  secondary  schools  upon  which  the  experi- 
mental tests  were  based.  The  group  differences  were  revealing  in 
another  way.  They  showed  that  on  the  basis  of  certain  differences 
in  musical  status  of  various  student  groups  there  were  significant 
differences  between  their  scores  on  the  T4  test. 

Taken  as  a  whole,  the  study  of  these  differences  shows  a  progres- 
sion of  significant  differences  for  several  levels  of  advancement  in 
music.  Since  these  groups  also  vary  in  age,  musical  experience  and 
training,  and  aptitude,  the  differences  may  be  interpreted  in  a 
number  of  ways.  It  may  be  on  the  basis  of  increased  selectivity  of 
the  groups  higher  on  the  scale.  It  could  also  be  a  result  of  greater 
experience  and  training,  or  a  combination  of  these  factors  together 
with  the  factor  of  natural  or  innate  ability.  However  interpreted, 
these  differences  do  point  to  the  increasing  presence  of  the  ability 
measured  by  the  T4  test  the  higher  the  scale  of  musical  development 
progresses.  In  showing  the  existence  of  group  differences,  the  test 
has  fulfilled  a  certain  need  of  the  study  for  ascertaining  the  distri- 
bution of  this  function  throughout  a  large  span  of  academic  and 
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musical  standing.  Indeed,  the  results  of  this  phase  of  the  study 
indicate  a  distinct  need  for  several  forms  of  a  test  which  should  be 
standardized  for  specific  groups. 

Group  retests 

In  view  of  the  wide  span  of  ability  measured  by  the  T4  test,  there 
is  the  possibility  that  reliability  studies  may  show  only  limited  re- 
sults because  of  the  relatively  homogeneous  groups  which  were  used 
in  retesting.  These  retests  are  available  for  two  groups  only,  one 
for  students  in  grades  six  to  eight,  and  the  other  for  unselected 
college  students  none  of  whom  were  majoring  in  music.  At  the 
time  it  was  not  possible  to  obtain  retests  on  any  sizeable  number  of 
music  students  because  of  crowded  class  schedules. 

The  retests  for  grades  six  to  eight  and  for  the  unselected  college 
students,  with  their  respective  coefficients  of  +  .74  and  +  .76,  would 
seem  to  compare  favorably  with  those  reported  on  similar  groups  for 
the  Seashore,  Knuth,  and  Drake  tests,^^  although  all  three  are  com- 
puted by  the  split-half  method  and  not  on  retests.  The  split-half 
correlation  of  +  .84  obtained  in  the  present  study  can  be  considered 
only  in  terms  of  the  heterogeneous  sampling  used  in  the  computation. 

Group  differences  on  basis  of  certain  musical  criteria  applied 

Selected  groups  on  several  levels  of  musical  advancement  were 
compared  on  the  basis  of  certain  observable  or  reported  differences 
between  them.  On  the  sixth  grade  academic  level  it  was  shown  that 
for  two  schools  judged  to  be  different  on  the  basis  of  general  ac- 
complishment and  standing  in  musical  experiences  there  was  a 
significant  difference  on  the  function  measured  by  the  T4  test.  For 
the  junior  high  school  the  results  of  testing  in  two  widely  separated 
school  systems  showed  that  choral  and  nonchoral  groups  were  found 
to  differ  significantly  with  respect  to  scores  on  the  interval  discrimi- 
nation test.  For  senior  high  school  students  of  musical  theory  there 
was  a  significant  difference  on  the  function  tested,  between  groups 
classified  as  superior  and  inferior  in  this  subject.  In  a  college 
composed  of  students  not  specializing  in  music,  significant  differ- 
ences on  the  T4  test  were  found  between  the  choral  and  the  nonchoral 
student  groups.  Finally  among  music  students  themselves  there 
was  a  difference  on  the  function  tested  between  students  on  the 
undergraduate  and  the  graduate  school  level. 

These  differences  found  on  every  level  of  ability  tested,  when 


13  See  pp.  74  f . 
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weighed  against  Flagg's  assertion  that  tonal  learning  is  a  basic 
factor  for  successive  levels  of  advancement  in  musical  experiences, 
strongly  support  the  belief  that  the  function  measured  by  the  T4 
test  plays  an  important  part  in  the  successful  pursuit  of  musical 
education  throughout  a  significant  range  of  academic  and  musical 
standing. 

Studies  of  individual  differences  on  the  basis  of  musical  criteria 
applied 

A  correlation  of  individual  differences  on  test  scores  with  dif- 
ferences in  specified  criteria  constitutes  the  usual  validation  pro- 
cedure for  a  given  instrument  of  measurement.  The  selection  of 
various  criteria  in  this  study  has  proceeded  according  to  a  definite 
philosophy  and  is  discussed  more  fully  in  the  final  chapter.  It  is 
sufficient  to  say  at  this  point  that  the  selection  of  these  criteria  are 
based  on  what  Seashore  terms  a  "specific  theory  of  measurement" 
as  opposed  to  the  "omnibus  theory  which  aims  to  validate  .  .  . 
against  the  total  situation  in  musical  performance. '  '^* 

The  correlations  of  +  .52,  +  .59,  and  +  .71  (Table  VIII),  using  the 
specific  abilities  associated  with  Flagg's  concept  of  tonal  learning  as 
a  criterion,  show  that  there  are  significant  relationships  observable 
between  individual  scores  on  the  T4  test  and  ratings  on  the  criterion. 
In  considering  these  results  due  recognition  should  be  made  of  the 
subjectivity  of  the  teacher  estimates  and  the  exploratory  nature  of 
the  test  itself.  The  point  to  be  stressed  is  that  the  relationship  is 
specific,  that  the  function  measured  by  the  test  has  been  correlated 
against  a  definite  objective  of  instruction  which  has  for  its  outcome 
a  "deepening  penetration  and  precision  in  tonal  learning"  and  an 
* '  increasing  awareness  of  tonal  relations. '  '^^ 

Studies  using  the  criterion  of  grade  in  musical  theory  are  specific 
in  a  broad  sense,  if  the  statements  of  Larson^^  and  the  Theory 
Committee,"  previously  quoted,  may  be  taken  as  representative  of 
the  universal  purpose  of  such  study.  At  the  Eastman  School  suc- 
cess in  the  course  indicates  general  musicality,  and  the  applica- 
tion of  administrative  procedure  is  specific  enough  to  consider 
questionable  any  student  who  fails  the  course.  At  Teachers  College, 
according  to  the  statement  of  the  Theory  Committee,  an  important 
objective  of  theory  study  is  a  knowledge  of  the  "structure  of  music," 


14  Saetveit,  Lewis,  and  Seashore,  op  cit.,  p.  47. 

15  See  p.  66. 

16  See  pp.  65  f . 

17  See  p.  67. 
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which  in  turn  is  essential  for  standards  of  taste  and  of  needed  pro- 
fessional skills. 

At  the  High  School  of  Music  and  Art  the  correlation  of  +  .55  for 
the  class  in  advanced  theory  has  shown  a  significant  relationship 
between  scores  on  the  T4  test  and  a  composite  measure  of  five 
semesters'  marks  in  the  study  of  musical  theory  by  students  in  this 
secondary  school. 

The  correlations  obtained  at  the  Juilliard  School,  considering 
the  possibilities  for  error  in  both  test  scores  and  theory  grades,  show 
impressive  relationships  between  measures  of  the  function  of  in- 
terval discrimination  and  success  in  several  courses  in  musical  theory 
on  the  first  year  level.  These  relationships  seem  especially  note- 
worthy for  work  in  musical  dictation  and  sight-singing  where  the 
r's  range  from  +.60  to  +.72.  These  latter  subjects  are  taught  in 
order  to  meet  specific  objectives  of  musical  education,  although,  to 
be  sure,  there  are  the  elements  of  rhythm  and  recognition  of  musical 
notation  studied  in  these  classes  which  have  not  been  taken  into 
account  in  the  discussion. 

Eecognition  is  therefore  made  of  certain  elements  in  musical 
structure  studied  in  theory  courses  which  are  not  related  to  the 
function  measured  by  the  T4  test,  chief  of  which  would  appear  to 
be  written  notation  and  various  patterns  of  rhythm.  It  would, 
therefore,  seem  important  that  the  psychological  function  of  interval 
discrimination  should  correlate  to  the  extent  that  it  does  with  grades 
which,  themselves,  represent  an  aggregation  of  activities  of  at  least 
a  semester  in  length  centering  upon  certain  aspects  of  the  study 
of  musical  structure. 

From  the  known  relationships  of  the  T4  test  with  the  concept 
of  tonal  learning  and  the  psychological  basis  of  the  part  which 
interval  awareness  plays  in  more  complex  tonal  experience,  it  may 
be  said  with  reasonable  assurance  that  the  study  of  tonal  relation- 
ships, as  differentiated  from  that  of  rhythmic  relationships,  must 
play  a  notable  part  in  the  work  of  these  theory  courses.  The  re- 
sults of  all  studies  show  that  the  function  of  interval  discrimination 
is  a  significant  index  of  individual  and  group  differences  using  cri- 
teria which  seem  to  center  on  aspects  of  tonal  imagery  or  tonal 
relationships.  Furthermore,  these  differences  seem  to  apply  on 
levels  of  ability  ranging  from  children  in  the  sixth  grade  to  music 
students  in  a  graduate  school. 

The  correlations  of  the  T4  test  with  other  measures  of  musical 
ability  are  too  limited  in  point  of  numbers  used  to  study  with  much 
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seriousness.  The  data,  however,  indicate  the  possibility  that  there 
is  a  certain  common  factor  operating  throughout  most  of  these 
published  tests,  but  the  available  correlations  appear  low  enough  to 
indicate  that  each  test  is  measuring  aspects  of  response  not  found 
in  others.  The  lack  of  any  significant  relationships  between  the  T4 
test  and  the  Seashore  test  of  rhythm  indicates  that  although  music 
itself  is  a  combination  of  tonal  and  rhythmic  patterns,  the  relation- 
ship between  them  is  not  at  all  close,  and  that  they  constitute 
different  psychological  responses. 


CHAPTEE  V 

FINAL  EVALUATION  OF  THE  STUDY 

The  specific  findings  of  this  study  have  been  presented  in  Chapter 
IV  and  are  summarized  at  its  close.  The  material  of  this  chapter 
considers  the  general  significance  of  the  achievements  of  the  study. 
The  discussion  is  divided  into  five  sections :  first,  the  significance  of 
the  musical  interval  in  relation  to  the  study ;  second,  the  application 
of  the  "specific  theory  of  measurement"  to  the  conduct  of  the  in- 
vestigation ;  third,  an  evaluation  of  the  contribution  of  the  study  as 
a  whole ;  fourth,  the  presentation  of  possible  uses  of  the  method  of 
testing  developed  in  this  research;  and  last,  the  relation  of  this 
method  of  testing  to  other  methods  of  the  testing  of  musical  aptitude 
and  talent. 

The  Importance  op  the  Musical  Interval  Experience 

Throughout  the  presentation  of  this  entire  investigation  certain 
aspects  of  the  nature  and  significance  of  the  musical  interval  have 
been  discussed.  In  order  that  a  final  evaluation  of  the  results  of 
this  study  may  be  as  comprehensive  as  possible,  a  review  is  made  of 
certain  outstanding  characteristics  of  the  musical  interval. 

Interval  Perception  a  Psychological  Function 

Perhaps  the  most  effective  way  to  emphasize  the  significant 
nature  of  the  interval  as  it  is  related  to  issues  in  this  study  is  through 
a  negative  approach.  This  is  done  by  raising  the  point  that  a  single 
note  hy  itself  does  not  constitute  music.  It  may  have  pitch,  dura- 
tion, loudness,  and  quality,  and  still  not  be  music.  The  psychologi- 
cal implication  of  this  statement  tends  to  be  obscured  because  the 
sensations  aroused  by  acoustical  properties  of  physical  sound  waves 
are  often  confused  with  the  structural  and  psychological  aspects  of 
musical  cognition.  A  single  note  constitutes  sound,  and  it  may  be 
a  very  pleasing  sound,  but  not  until  other  notes  are  brought  into 
some  relationship  with  the  first  note  does  authentic  musical  meaning 
develop.  Haydon,^  in  discussing  the  musical  interval,  is  emphatic 
on  this  point,  and  he  maintains  that  the  beginnings  of  music  are 
approached  only  when  various  patterns  of  tonal  relations  are 
brought  into  play.    He  writes  as  follows : 


1  Haydon,  Glen.     Introduction  to  Musicology,  p.  83.     Prentice-Hall,  New 
York,  1941. 
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It  has  been  repeatedly  emphasized  that  tones  in  isolation  do  not  constitute 
music.  ...  In  other  words,  when  we  begin  to  study  the  psychological  aspects  of 
tonal  relations  as  expressed  by  such  terms  as  interval,  scale,  rhythm,  melody, 
and  harvwny,  we  are  getting  closer  to  actual  music. 

Haydou  defines  the  musical  interval  psychologically  as  "the  per- 
ception of  the  relation  between  two  tones  with  reference  to  pitch. ' ' 
He  terms  it  a  perceptual  process,  a  Gestalt  of  such  simplicity  as  to 
constitute  a  "fundamental  unit  of  perception."  Studies^  indicate 
that  a  musical  interval  tends  to  retain  its  essential  perceptual  qual- 
ity whether  presented  melodically  or  harmonicallj^  although  other 
tj'pes  of  response  accompany  each  of  the  two  methods  of  presenta- 
tion. Intervals  played  in  sequence  tend  to  produce  a  feeling  of 
melodic  movement  and  when  the  two  notes  of  the  interval  are 
sounded  simultaneously  there  is  produced  a  simple  form  of  har- 
monic feeling,  sometimes  referred  to  by  theorists  as  a  bi-chord. 
Depending  upon  the  way  the  interval  is  used  functionally,  there- 
fore, the  unit  of  perception  involved  in  the  musical  interval  may 
be  the  basis  for  melodic  development  on  the  one  hand,  or  harmonic 
developmeni:  on  the  other.  The  whole  system  of  tonal  relationships 
in  music  is  based,  together  with  rhythmic  patterns,  on  designs  which 
emanate  from  these  two  sources  of  cognition  of  the  musical  interval. 
Because  the  interval  posesses  within  itself  the  basis  for  these  later 
developments,  it  has  been  chosen  for  study  as  an  index  of  musical 
ability,  especially  of  a  tonal  nature. 

Since  the  perception  of  the  intervals  is  a  psychological  function, 
it  becomes  a  phase  of  mental  activity,  in  contrast  to  sensory  recep- 
tion of  physical  sound  stimuli.  Mursell  in  his  work  on  the  psychol- 
ogy of  music  recognizes  the  psychological  basis  for  the  perception 
of  the  musical  interval  and  states  that  its  effect  depends  upon  the 
selective  response  of  the  central  nervous  system.^ 

This,  however,  is  the  same  psychological  process  involved  in  the 
perception  of  total  musical  experience.  In  the  same  work  Mursell 
contends  that  authentic  musical  meaning  develops  through  the 
"organizing  and  transforming  operations  of  the  mind"*  as  it  acts 
upon  physical  stimuli  of  patterns  expressed  through  sound.  It  is 
this  psychological  similarity  between  the  perception  of  the  interval 
and  the  cognition  of  larger  patterns  of  musical  meaning  which  has 
further  justified  the  study  of  interval  discrimination  as  an  index  of 
complex  patterns  of  musical  behavior. 

2  For  an  extended  discussion  of  the  psychological  aspect  of  interval  percep- 
tion see  Haydon,  Glen,  ibid.,  pp.  82-83;  87-92;  Mursell,  op.  cit.,  pp.  81-98. 

3  Mursell,  ihid.,  p.  82. 

4  7buZ.,  p.  51. 
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Kecognition  of  the  part  which  interval  awareness  plays  in  tonal 
experience  is  also  made  by  Flagg.  In  her  concept  of  tonal  learning 
she  has  included'^  the  ability  to  sing  a  difficult  interval  as  one  of  the 
behavior  traits  she  associates  with  this  objective.  The  importance 
and  significance  of  the  interval  experience  is  also  indicated  by  the 
almost  universal  use  of  the  study  of  musical  intervals  as  a  basis  for 
subsequent  study  in  musical  theory. 

What  has  made  the  interval  important  in  this  study,  therefore, 
is  that  it  constitutes  the  simplest  unit  of  musical  perception,  and 
that  it  utilizes  the  same  psychological  basis  for  comprehension  which 
is  employed  in  the  understanding  of  musical  values  of  a  more  com- 
plex nature. 

The  "Specific  Theory  of  Measurement" 
An  Issue  Raised  hy  Seashore 

Seashore  raises  a  very  important  issue  in  the  theory  of  measure- 
ment, particularly  as  he  applies  it  to  the  construction  and  validation 
of  music  tests.  He  refers  to  two  distinct  types  of  approach.  The 
most  common  procedure  with  reference  to  validation,  he  avers,  is  the 
omnibus  type,*^  in  which  a  test  or  battery  of  tests  is  measured  against 
a  blanket  rating  embracing  either  total  musical  behavior  or  complex 
responses  which  frequently  contain  factors  with  little  or  no  relation- 
ship to  the  function  measured  by  the  test. 

In  contrast  to  this  type  of  validation  he  cites  a  theory  which  has 
motivated  his  own  work,  which  he  designates  as  "the  theory  of 
specifics."  In  the  validation  of  his  measures  of  musical  talents  he 
reports  that  he  has  steadfactly  refused  to  correlate  test  scores 
against  what  he  terms  "unanalyzed  judgments  about  musical 
achievement. ' '  This  procedure  he  characterizes  as  the  omnibus  type 
of  validation  both  because  an  entire  test  battery  has  been  considered 
a  single  body  of  data,  and  because  the  test  results  have  been  related 
to  some  very  complex  forms  of  musical  behavior.  For  this  reason 
he  states  that  he  has  always  protested  against  the  use  of  an  average 
of  the  six  measures  of  his  tests.  It  may  be  mentioned,  in  passing, 
that  there  have  been,  nevertheless,  some  very  careful  studies  by 
others,  using  this  type  of  procedure,  among  them  the  study  of  Lar- 
son" which  has  been  referred  to  in  this  work. 


5;Seep.  67. 

6  Seashore,  Carl  E.     "The  Psychology  of  Music,"  Article  No.  XI.     Music 
Educators  Journal,  Vol.  XXIV,  No.  3,  Dec.  1937,  p.  25. 

7  See  pp.  65  f . 
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The  "Theory  of  Specifics"  Applied  to  the  Present  Study 

It  should  be  a  matter  of  note  that  the  spirit  of  * '  specific  measure- 
ment" has  been  maintained  throughout  most  of  the  studies  in  this 
investigation.  This  has  been  true  not  only  of  certain  types  of  cri- 
teria selected,  but  also  in  the  development  of  objective  test  responses. 
The  account  found  in  Chapter  II  in  ascertaining  the  most  desirable 
aspects  of  response  on  the  function  of  interval  discrimination  has 
shown  this.  It  is  seen  in  the  elimination  of  the  use  of  intervals 
played  in  sequence  from  early  testing  procedures  because  of  con- 
flicting responses  involving  patterns  of  pitch  direction,  pitch  dis- 
tance, and  melodic  movement.  It  is  shown  in  the  elimination  of 
the  "test  and  learning"  situation  which  depended  to  a  considerable 
extent  upon  memory  for  intervals.  It  is  shown  in  the  construction 
of  later  tests  which  presented  intervals  on  different  pitch  levels  in 
order  to  eliminate  the  factor  of  pitch  memory.  The  development  of 
the  multiple-response  item  with  its  resulting  concentration  of  quali- 
tative differences  in  interval  recognition  is  an  outgrowth  of  a  consis- 
tent purpose  to  strive  for  a  specific  and  valid  psychological  response 
associated  with  a  single  perceptual  ability. 

Thus,  the  entire  preliminary  work  of  the  investigation  resulted 
in  the  development  of  what  was  considered  a  single  isolated  unit  of 
perception.  This  unit  was  incorporated  in  an  objective  test  situ- 
ation while  at  the  same  time  its  identity  as  a  representative  function 
of  organized  musical  experience  was  maintained. 

The  selection  of  criteria  has  also  been  made  with  all  the  speci- 
ficity possible  under  the  circumstances.  The  underlying  purpose  of 
the  study  has  been  to  relate  the  function  tested  to  types  of  musical 
experience  which  have  for  a  basis  patterns  of  a  tonal  nature.  Re- 
gardless of  the  subjectivity  of  estimates  of  ability  in  this  area,  the 
experience  itself  is  one  of  considerable  objectivity.  Seashore  speaks 
of  this  in  a  quotation  already  mentioned.*  In  the  article  quoted  he 
credits  the  capable  musician  with  an  ability  to  * '  hold  up  for  detailed 
and  objective  scrutiny  the  tonal  situation  which  he  wishes  to  cre- 
ate. ' '  The  fact  that  estimates  on  this  ability  are  somewhat  difficult 
to  secure  should  in  no  way  obscure  the  specificity  of  purpose  in 
validating  a  test  against  such  criteria. 

One  principal  source  of  these  criteria  relating  to  tonal  experience 
is  to  be  found  in  classes  where  instruction  in  musical  theory  is 
undertaken.    For  periods  of  at  least  one  semester  instruction,  drill 

8  See  pp.  21  ff. 
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and  assignments  deal  almost  exclusively  with  aspects  of  tonal  rela- 
tionships. Many  and  frequent  observations  are  possible  for  instruc- 
tors during  this  period,  and  although  final  ratings  may  be  regarded 
as  subjective,  they  are  nevertheless  based  upon  a  series  of  experi- 
ences which  tend,  in  the  end,  to  possess  some  reliability.  The  impor- 
tance of  theory  study  has  been  emphasized  elsewhere  in  this  work. 
The  emphasis  of  the  present  discussion  is  on  the  definite  nature  of 
these  experiences  taken  as  a  whole. 

Ratings  on  tonal  learning  have  constituted  another  criterion  used 
in  the  validation  of  the  T4  test.  Tonal  learning,  as  Flagg  has  pointed 
out,  is  a  desirable  educational  objective,  a  growth  primarily  con- 
cerned with  a  ''deepening  penetration  and  precision  in  tonal  learn- 
ing" and  "rooted  in  increasing  awareness  of  tonal  relations"^ 
These  statements  reveal  the  singleness  and  definiteness  of  purpose 
underlying  this  educational  objective  which  has  been  an  aim  of  in- 
struction in  an  appreciation  course  in  a  leading  secondary  school. 

It  may  reasonably  be  questioned  whether  the  criterion  of  choral 
singing  used  in  studies  of  group  differences  is  specific  enough  to 
satisfy  the  requirements  of  the  theory  of  "specific  measurement." 
While  it  is  true  that  choral  membership  is  dependent  upon  such 
diverse  factors  as  vocal  qualitj^  personal  interest,  and  drive  to 
maintain  membership,  it  may  be  reasoned  that  success  and  satisfac- 
tion in  the  pursuit  of  choral  work  may  possibly  provide  the  most 
effective  stimulus  to  continued  activity.  These  requirements  may 
be  defined  in  a  number  of  ways  by  different  authorities.  By  and 
large  they  would  appear  likely  to  center  about  the  ability  to  sing 
in  tune,  to  maintain  a  melody  or  an  inner  part,  and  to  memorize 
music.  It  seems  possible  that  these  requirements  could  have  for 
their  basis  an  authentic  feeling  for  tonal  relationships  which,  in  the 
last  analysis,  is  the  specific  area  of  ability  with  which  this  study  has 
been  concerned.  Therefore,  if  there  is  any  specificity  to  the  criterion 
of  choral  singing  used  in  this  study,  it  would  appear  to  concentrate 
on  these  phases  of  choral  requirements. 

Other  criteria  used  in  this  study  are  certainly  of  the  omnibus 
type,  but  they  were  used  in  supplementary  investigations  incidental 
to  the  main  purpose  of  this  research.  There  are  two  such  studies; 
first,  the  investigation  of  the  status  of  music  in  sixth  grade  elemen- 
tary schools  and,  second,  the  comparison  between  the  scores  of  under- 
graduate and  graduate  students  in  music. 


9  Flagg,  Marion,  loc.  cit. 
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The  Significance  of  the  Results  of  the  Investigation 

The  Significance  of  the  S'pecific  Relationships 
Ascertained 

The  aim  of  the  entire  study,  as  has  been  reiterated  throughout 
this  work,  has  been  to  examine  the  effectiveness  of  the  function  of 
interval  discrimination  as  an  index  of  musical  ability,  particularly 
in  significant  areas  of  tonal  experience.  A  concomitant  of  this 
research  has  been  to  note  the  power  of  a  test  of  this  function  to 
differentiate  between  various  groups  on  the  basis  of  certain  differing 
musical  attributes  of  each  group. 

The  knowledge  of  the  relationships  between  test  scores  and  the 
various  criteria  are  of  importance  in  certain  school  and  classroom 
situations.  In  the  teaching  of  various  theory  courses,  for  example, 
a  test  of  the  T4  type  could  aid  in  diagnosing  capacity  for  under- 
taking such  work.  Further  specific  educational  uses  of  the  test  are 
suggested  in  a  later  section  of  the  chapter. 

It  is  also  of  interest  to  know  that  a  test  of  the  T4  type,  when 
administered  or.  the  secondary  school  level,  may  be  of  use  specifically 
if  objectives  of  instruction  are  the  same  or  similar  to  those  held  for 
tonal  learning.  Again,  such  knowledge  might  be  of  great  value  to 
chorus  directors  who  need  information  at  their  disposal  to  select, 
effectively,  prospective  chorus  members  from  large  student  bodies. 

Significant  relationships  have  been  found  between  test  scores  and 
criteria,  and  significant  differences  have  been  found  between  the 
scores  of  groups  which  differ  in  respect  to  status  in  music.  These 
results  in  general  have  not  been  totally  unexpected,  since  practical 
experience  of  music  educators  and  a  psychological  analysis  of  types 
of  interval  perception  show  the  function  of  the  test  to  be  a  basic 
factor  in  much  musical  experience  of  a  tonal  nature.  Consequently, 
these  statistical  studies  have  simply  substantiated  a  partial  belief 
already  existent  in  the  fundamental  nature  of  the  interval  experi- 
ence and  its  significance  in  musical  development  in  several  areas. 

The  Significance  of  the  Method  of  Testing  Involved  in  the 

Investigation 

There  are  two  aspects  of  a  discussion  of  the  significance  of  the 
method  of  testing  presented  in  these  chapters.  A  first  consideration 
is  the  significance  of  the  development  of  a  test  of  intervals.  The 
second  consideration  is  the  significance  of  the  method  of  utilizing 
measures  of  such  a  test  in  the  evaluation  of  musical  behavior  of 
various  kinds. 
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A  certain  importance  must  be  attached  to  the  development  of  the 
exploratory  instrument  of  measurement,  the  T4  test,  which  is  a  mea- 
sure of  interval  discrimination.  Through  the  efforts  of  preliminary- 
studies,  reported  in  Chapter  II,  an  objective  instrument  of  measure- 
ment has  been  devised  with  a  degree  of  precision  reflected  in  re- 
ported reliability  and  validity  coefficients.  The  importance  of  this 
achievement  is  increased  by  the  circumstance  that  persons  with  no 
previous  experience  can  be  tested  in  this  manner,  and  that  such  a 
test  may  be  administered  to  a  range  of  academic  status  from  the 
sixth  grade  in  the  elementary  school  to  first-year  students  in  a  school 
of  music.  In  a  later  section  it  will  also  be  shown  that  the  literature 
of  research  has  recognized  the  need  for  an  objective  means  of  testing 
this  function. 

The  Significance  of  the  Criteria  Employed 

It  has  been  pointed  out  that  in  the  selection  of  criteria  for  the 
evaluation  of  the  T4  test  the  underlying  purpose  has  been  to  obtain 
ratings  definitely  associated  with  musical  activity  of  a  tonal  nature. 
Although  these  criteria  possess  different  names  and  definitions,  it  is 
shown  presently  that  they  constitute  various  functional  aspects  of 
the  area  of  tonal  relationships  taken  as  a  whole.  It  is  not  possible 
to  obtain  a  direct  measure  of  this  area  any  more  than  to  obtain  direct 
measures  of  so-called  intelligence,  for  all  mental  measurement  re- 
quires the  use  of  secondary  sources  of  data.  Therefore,  in  the  selec- 
tion of  measures  of  ability  of  a  tonal  nature  various  criteria  associ- 
ated with  aspects  of  instruction  in  tonal  relationships  have  been 
chosen  for  the  validation  studies  of  this  investigation. 

It  can  be  shown  that  the  study  of  tonal  materials  must  proceed 
through  the  same  three  avenues  which  prevail  for  all  musical  experi- 
ence, namely,  by  listening,  by  performing,  and  by  creating.  Theory 
study,  as  a  general  rule,  is  offered  in  the  form  of  instruction  in  dic- 
tation, sight-singing,  and  written  theory,  and  at  the  Juilliard  School 
of  Music  these  three  aspects  are  studied  in  separate  classes. 

A  course  in  dictation  may  be  defined  as  a  study  of  tonal  material 
based  essentially  on  the  function  of  listening.  Work  in  sight-sing- 
ing is  a  study  of  tonal  material  from  the  point  of  view  of  perform- 
ance, in  this  instance  the  performance  of  the  human  voice.  Written 
theory,  on  the  other  hand,  is  devoted  more  to  the  creative  aspects  of 
musical  experience.  All  three  phases  of  musical  theory,  however, 
may  be  said  to  have  a  common  requirement  of  an  ability  to  think  in 
terms  of  tonal  materials  of  music.     This  assertion  is  borne  out  by  a 
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statement  by  the  Head  of  the  Theory  Department  at  the  Juilliard 
School  of  Music.  According  to  this  authority  all  theory  study  at  the 
institution  stresses  the  development  of  an  aural  perceptual  ability 
of  a  tonal  nature.  Stated  in  terms  of  classroom  techniques,  this 
means  that  the  student  is  first  expected  to  think  the  tonal  material 
through  silently  before  he  does  anything  with  it.  This  is  especially 
emphasized  in  written  theory  work  where  there  is  a  tendency  on  the 
part  of  most  students  to  proceed  according  to  stated  rules  rather 
than  through  educated  tonal  thinking. 

From  this  it  may  be  seen  that  the  criteria  based  on  these  three 
phases  of  theory  study  represent  three  different  aspects  of  tonal 
ability  in  music.  It  may  therefore  be  expected  that  ratings  derived 
from  these  three  sources  will  differ  somewhat,  and  that  they  will  be 
influenced  by  factors  related  to  the  medium  through  which  the 
activities  are  carried  on.  This  probably  holds  true  for  music  stu- 
dents taken  as  a  whole,  although  there  may  also  be  individual  dif- 
ferences. Thus,  some  students  may  do  their  musical  thinking  better 
through  performing  rather  than  through  listening,  while  others  may 
do  better  in  written  work.  This  is  somewhat  analogous  to  the  com- 
prehension of  verbal  material  where  certain  persons  tend  to  under- 
stand more  easily  through  the  spoken  word,  whereas  others  compre- 
hend more  effectively  through  material  which  is  written. 

The  study  of  tonal  material  is  not  always  separated  according  to 
these  divisions.  Sometimes  in  the  interest  of  economy  of  instruction 
or  of  integration  of  these  various  aspects  one  single  course  is  offered. 
Theory  work  at  the  High  School  of  Music  and  Art  has  for  a  basic 
philosophy  the  integration  of  these  aspects  into  single  courses. 

The  study  of  tonal  material  on  the  secondary  school  level  has 
been  integrated  with  work  in  general  appreciation  in  the  teaching 
of  Flagg  at  the  Horace  Mann  School.  Flagg's  objective  of  tonal 
learning  was  definitely  committed  to  the  objective  of  tonal  growth 
of  elementary  and  secondary  school  students.  The  educational  phi- 
losophy motivating  this  objective  has  been  discussed  elsewhere,  but 
the  specific  attributes  of  student  behavior  associated  with  the  total 
criteria  bear  repeating.  They  have  been  defined  as  "the  ability  of 
the  ear  to  lay  hold  and  to  perceive  tonal  material  based  upon  (a)  the 
ability  to  sing  a  memorized  melody  accurately  and  (b)  the  ability  to 
sing  a  dictated  melody  and  a  difficult  interval."  These  behavior 
traits,  it  will  be  seen,  are  associated  with  the  two  aspects  of  tonal 
experience  of  listening  and  performing. 

The  significance  of  these  validity  studies  and  of  the  criteria  em- 
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ployed,  therefore,  transcends  the  individual  noteworthiness  of  cor- 
relations with  certain  types  of  theory  study  or  of  certain  objectives 
in  a  musical  appreciation  course  for  high  school  students.  Although 
the  criteria  may  be  subjective,  and  may  contain  some  extraneous 
factors,  they  appear  to  represent  the  best  possible  approach  to  esti- 
mates of  development  in  tonal  imagery  or  tonal  relationships  in  the 
broadest  sense.  The  significance  of  the  validation  proceedings  as  a 
whole  also  makes  it  possible  to  interpret  scores  obtained  on  various 
levels  of  musical  development  with  better  understanding  than 
before. 

The  Differentiating  Power  of  the  T^  Test 

In  the  data  presented  in  Chapter  IV  the  T4  test  was  found  to 
differentiate  between  a  number  of  groups  which  differed  with  respect 
to  certain  aspects  of  musical  development  or  experience.  These  dif- 
ferences taken  singly  are  important  enough,  but  the  outstanding 
significance  of  the  series  of  studies  is  that  the  same  function  which 
is  found  to  be  a  valid  index  of  tonal  development  at  higher  levels  of 
musical  advancement  is  also  effective  in  distinguishing  between 
groups  at  other  levels  of  advancement  in  music.  Thus,  the  two 
sixth-grade  groups  which  were  rated  different  with  respect  to  status 
in  music  were  significantly  different  on  the  T4  test.  Three  separate 
choral  groups  were  significantly  different  from  nonchoral  groups  on 
the  same  test.  Two  theory  classes  in  a  secondary  school  considered 
high  and  low  in  ability  also  differed  significantly  on  the  test  of 
intervals. 

From  the  known  validity  of  the  test  on  higher  levels  of  musical 
development  it  may  be  reasoned  that  this  same  function  accompanied 
and  was  possibly  instrumental  in  the  successful  pursuit  of  musical 
endeavors  on  these  lower  levels  of  ability.  This  assumption  is 
strengthened  by  Flagg's  assertion  that  tonal  learning  is  a  basic 
factor  in  musical  growth  and  is  applicable  to  "young  children  hav- 
ing their  first  musical  experiences"  as  well  as  to  "older  students 
approaching  for  the  first  time  a  systematic  organization  of  the  ele- 
ments of  past  musical  experience," 

It  is  not  known  in  what  way  the  two  sixth-grade  groups  in  a 
certain  elementary  school  system  differed.  The  music  supervisor 
simply  reported  that  in  her  estimation  they  did  differ  in  general 
standing  in  music.  Moreover,  they  differed  significantly  on  the  T4 
test.  In  the  light  of  the  validity  of  the  test  with  estimates  of  tonal 
development,  and  the  statement  by  Flagg  that  tonal  learning  takes 
place  at  varying  levels  of  age  and  maturity,  one  may  reasonably 
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conclude  that  an  important  source  of  success  in  the  one  sixth-grade 
class  over  the  other  was  the  difference  in  development  or  maturation 
of  experiences  of  a  tonal  nature. 

It  should  now  be  possible  to  attach  greater  importance  to  differ- 
ences between  choral  and  nonchoral  groups  on  the  function  mea- 
sured by  the  T4  test.  The  presence  of  significantly  greater  degrees 
of  this  function  can  mean  that  these  choral  groups,  either  through 
past  experience  and  training,  or  native  endowment,  possess  a  capac- 
ity for  tonal  relationships  which,  in  turn,  have  a  bearing  on  success- 
ful pursuit  of  choral  work. 

There  is  a  further  meaning  to  be  attached  to  the  relationships  of 
the  T4  test  to  grades  in  theory  in  schools  of  music.  It  is  known  that 
students  in  advanced  theory  at  the  High  School  of  Music  and  Art 
had  received  instruction  and  drill  in  interval  recognition.  At  the 
Juilliard  School  of  Music  many  in  the  group  had  at  one  time  or 
another  received  such  instruction.  Nevertheless,  the  test  of  inter- 
vals was  effective  in  providing  a  certain  distribution  of  ability,  and 
against  this  distribution  was  correlated  the  differences  in  grades 
which  resulted  in  the  validity  coefficients  which  have  already  been 
reported.  This  situation  points  to  the  possibility  that  there  may  be 
limits  to  the  development  of  interval  awareness,  and  if  so  it  points 
to  a  limitation  for  some  persons  in  the  development  of  tonal  experi- 
ences. 

The  difference  between  the  undergraduate  and  graduate  groups 
of  music  students  appears  to  be  of  minor  importance.  It  is  possible 
that  these  differences  were  occasioned  by  the  greater  training  and 
experience  and  increased  selectivity  of  the  graduate  group. 

Summary  of  Results 

Summarizing  the  discussion  of  this  section,  some  of  the  larger 
developments  of  the  study  are  as  follows : 

1.  The  T4  test  of  interval  discrimination  is  an  index 

of  ability  in  tonal  experience  and  tonal  learning. 

2.  The  T4  test  of  interval  discrimination  has  differ- 

entiating power  at  various  levels  of  musical 
development  from  the  sixth  grade  to  the  con- 
servatory level. 

3.  The  T4  test  of  interval  discrimination  can  reliably 

distinguish  between  choral  and  nonchoral  groups 
where  the  basis  for  the  selection  of  choral  candi- 
dates is  similar  to  that  which  prevailed  in  the 
situations  studied. 
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4.  The  T4  test  of  interval  discrimination,  under  cer- 
tain conditions,  can  reliably  distinguish  between 
sixth-grade  classes  judged  superior  and  inferior 
on  the  basis  of  general  musical  development. 

Limitations  of  Certain  Results  of  the  Study 

It  is  admitted  that  the  T4  test,  because  of  its  exploratory  nature 
and  the  wide  range  of  ability  which  it  has  measured,  may  not  be  as 
precise  as  it  might  be  had  it  been  constructed  for  a  more  limited 
range  of  ability.  However,  if  the  study  has  lost  on  this  account,  it 
has  gained  from  the  knowledge  gathered  on  the  different  levels  of 
ability  which  have  been  tested. 

Further  limitations  in  the  precision  of  criteria  are  also  admitted. 
In  addition  to  unavoidable  subjectivity  in  grades  and  other  esti- 
mates, these  criterion  values  probably  include  some  aspects  of  stu- 
dent ability  in  musical  notation  and  rhythm.  However,  as  far  as 
notation  is  concerned,  music  students  usually  have  an  acceptable 
functional  knowledge  of  this  factor  through  previous  experience  in 
musical  performance.  This  would,  indicate  that  knowledge  of  nota- 
tion is  possibly  not  a  major  aspect  of  grades  in  the  theory  courses 
considered  in  this  study.  ^ 

The  part  which  rhythm  plays  in  theory  work  would  appear  to 
be  so  merged  with  the  study  of  tonal  relationships  as  to  be  almost 
inseparable.  Consequently,  the  various  theory  grades  used  as  cri- 
teria must  be  considered  as  including  unknown  quantities  of  this 
factor.  In  most  discussions  of  the  value  of  theory  study  the  domi- 
nant emphasis  seems  to  center  on  aspects  of  the  mastery  of  tonal 
material,  with  rhythmic  aspects  treated  in  a  somewhat  -incidental 
manner.  Nevertheless,  if  rhythm  did  play  any  prominent  part  in 
the  total  estimates  of  ability  in  these  courses,  the  reported  validity 
coefficients  with  theory  grades  would  appear  very  unusual,  indeed, 
in  view  of  the  lack  of  correlation  between  the  Seashore  rhythm  test 
and  the  T4  test. 

Studies  have  not  been  made  below  the  sixth-grade  level.  There 
would  appear  to  be  room  for  further  knowledge  in  this  area,  and  it 
should  be  learned,  if  possible,  at  what  ages  this  ability  first  begins 
to  develop. 

Uses  for  the  Methods  of  Testing  Developed  in  This  Study 

The  Meaning  of  Test  Scores 

A  discussion  of  the  usefulness  of  the  T4  test  must  be  based  on  a 
clear  understanding  of  the  meaning  to  be  attached  to  test  scores. 
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For  this  reason  a  short  interpretation  of  the  nature  of  the  function 
of  interval  discrimination  is  desirable.  This  can  be  in  terms  of  the 
knowledge  developed  of  the  function  itself  and  also  in  terms  of  a 
description  of  the  area  of  musical  ability  of  which  the  test  is  an 
index. 

One  of  the  primary  assumptions  of  the  study  has  been  that  the 
function  of  interval  discrimination  is  an  aptitude  which,  according 
to  Warren 's  Dictionary,  represents  ' '  a  condition  or  set  of  character- 
istics regarded  as  symptomatic  of  an  individual's  ability  to  acquire 
with  training  some  knowledge,  skill,  or  set  of  responses  such  as  the 
ability  to  speak  a  language,  to  produce  music,  etc."  For  this  study 
a  further  interpretation  consists  in  the  assumption  that  it  represents 
present  condition  with  no  reference  as  to  whether  the  condition  is 
acquired  or  inborn. 

Previous  discussion  of  aspects  of  interval  discrimination  and  its 
related  field,  tonal  relationships,  has  shown  that  both  areas  are  af- 
fected by  factors  of  directed  growth,  maturation,  and  training. 
Statistical  studies  of  measures  of  the  T4  test  point  to  a  progressive 
increase  in  ability  as  groups  higher  on  the  scale  of  musical  develop- 
ment are  studied.  An  examination  of  philosophies  of  music  educa- 
tion concerning  individual  development  of  tonal  relationships  shows 
that  successful  work  in  music  as  a  whole  depends  upon  growth  in 
this  area  through  active  cultivation  and  directed  study. 

Measures  of  the  function  of  interval  discrimination  taken  at  any 
time,  therefore,  must  be  considered  as  representing  a  certain 
maturation  level  of  tonal  development  of  the  individual  tested,  and 
be  regarded  as  an  aptitude  which  becomes  progressively  better  with 
age,  experience,  and  training.  Such  a  measure  obviously  is  a  prod- 
uct, of  native  and  acquired  ability,  and  it  would  appear  difficult  to 
conceive  of  any  psychological  measure  in  this  field  which  could 
separate  these  two  types  of  ability.  In  this  respect  the  dynamic 
qualities  of  the  function  differ  materially  from  the  comparative 
stability  which  exists  for  measures  of  sensory  capacity  obtained  for 
the  Seashore  test." 

It  may  be  that  for  individuals  there  are  limits,  as  Bingham 
suggests,"  to  the  development  of  different  aptitudes,  but  this  study 
has  not  directly  examined  that  phase  of  the  problem.  These  limita- 
tions have  been  suggested  by  the  experiences  of  interval  testing  on 


10  A  study  of  the  Seashore  measures  shows  them  to  he  relatively  stable.  See 
Stanton,  Hazel,  and  Koerth,  W.  Musical  Capacily  Measures  of  Adults  Be- 
peated  after  Music  Education.     University  of  Iowa,  Iowa  City,  la.,  1930. 

11  See  p.  17. 
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the  conservatory  level,  from  the  observatioii  that  validation  pro- 
cedures were  significant  in  spite  of  previous  training  on  intervals. 
Maximum  effectiveness  of  the  usefulness  of  this  method  of  testing 
will  come  when  the  extent  of  these  limitations  becomes  known.  The 
establishment  of  norms  on  standardized  forms  of  this  test  for  vari- 
ous ability  levels  and  for  different  classifications  of  musical  activity 
is  also  needed  before  reliable  diagnostic  and  prognostic  work  can  be 
undertaken. 

What  may  be  established  at  present,  however,  is  that  measures 
of  interval  discrimination  appear  to  serve  as  an  index  of  certain 
aspects  of  musical  development  throughout  a  comparatively  large 
span  of  physical  and  musical  maturation.  These  measures  may  be 
likened,  in  a  sense,  to  scores  on  mental  tests  used  in  the  development 
of  mental  age  or  intelligence  quotients.  Test  scores  on  interval  dis- 
crimination, therefore,  appear  to  have  far  greater  importance  than 
measures  of  sensory  capacity  even  though  the  latter  represent  com- 
paratively stable  measures. 

Usefulness  in  Musical  Diagnosis  and  Prognosis 

Since  the  methods  of  testing  have  been  strictly  in  the  spirit  of 
the  "specific  theory  of  measurement,"  as  proposed  by  Seashore, 
they  are  to  that  extent  useful  for  a  number  of  diagnostic  purposes, 
for  they  are  measures  which  tell  of  present  ability.  By  the  same 
token  they  may,  under  certain  conditions,  be  prognostic  of  future 
development.  Not  only  may  such  a  method  of  testing  be  used  to 
evaluate  individuals  and  groups,  but  there  is  also  the  possibility  of 
judging  the  effectiveness  of  various  methods  of  teaching  certain 
subjects  in  music  through  such  testing. 

Usefulness  in  the  Teaching  of  Appreciation 

A  vital  link  between  the  study  of  theory  and  appreciation  of 
music  has  been  noted  in  the  statement  by  the  Theory  Committee 
quoted  on  page  67.  There  is  little  reason  why  certain  aspects  of 
appreciation  cannot  be  considered  distinctly  related  to  an  aware- 
ness of  the  tonal  values  present  in  a  given  musical  composition. 
There  appears  to  be  a  possibility  that  the  use  of  this  method  of  test- 
ing may  help  to  reduce  the  intangibility  which,  in  general,  seems 
to  surround  the  subject  of  music  appreciation.  The  test  may,  at 
least,  measure  aspects  of  appreciation  associated  with  an  awareness 
of  tonal  values  of  musical  composition.  The  results  of  such  testing 
should  also  be  useful  in  the  selection  of  material  presented  in  music 
appreciation  classes. 
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Usefulness  as  an  Entrance  Test  for  Music  Freshmen  in  College 

The  correlation  of  the  T4  test  with  grades  in  music  theory  ap- 
pears sufficient  to  mark  this  method  of  testing  as  useful  in  the 
evaluation  of  freshmen  in  a  music  school.  A  student  may  be  ap- 
prised of  the  success  he  may  meet  in  the  study  of  music  theory.  The 
method  of  testing  may  also  be  useful  in  evaluating  the  effectiveness 
of  certain  types  of  theorj^  or  other  music  teaching,  through  the  ad- 
ministration of  initial  and  final  tests  of  interval  discrimination. 

Usefulness  in  the  Selection  of  Choral  Groups 

Directors  of  school  and  college  musical  organizations  are  usually 
pressed  for  time  and  opportunity  in  the  examination  of  candidates 
for  choral  work.  The  possession  of  a  good  voice  is  only  one  of  the 
requirements  for  successful  choral  work.  With  the  significant  dif- 
ferences found  between  choral  and  nonchoral  groups  on  the  T4 
test,  and  the  validation  of  the  test  on  other  bases,  the  usefulness  of 
the  test  as  an  aid  in  the  selection  of  candidates  for  choral  work  seems 
justified.  It  is  possible  that  in  other  schools,  for  a  basis  of  selection 
of  choral  candidates  other  than  tonal  ability,  the  test  might  not 
distinguish  between  choral  and  nonchoral  groups.  The  T4  test  as 
it  now  stands  has  been  used  as  an  aid  in  selecting  chorus  members 
at  the  New  Jersey  State  Teachers  College  at  Jersey  City  for  two 
years. 

Usefulness  in  Further  Research 

Throughout  the  presentation  of  various  topics  there  have  been 
occasional  references  to  opportunities  for  further  research.  Some 
of  these  possibilities  are  now  discussed. 

In  the  consideration  of  patterns  entering  into  the  construction 
of  the  final  T4  test  the  validity  of  different  types  of  interval  com- 
parison was  raised. ^^  There  are  indications  from  these  data  that 
ability  to  distinguish  between  the  mild  and  pronounced  dissonances 
is  more  related  to  success  on  the  test  as  a  whole  than  ability  on  the 
more  consonant  intervals.  These  data,  together  with  the  results  of 
Ortmann's  classification  of  interval  errors  (found  on  page  45),  may 
give  substantial  clues  to  a  future  research  in  this  area.  It  is  possi- 
ble that  validity  for  a  certain  interval  changes  somewhat  with 
musical  development.  If  so,  further  item  analysis  using  selected 
groups  will  answer  this  question. 

There  is  a  definite  educational  need  for  a  knowledge  of  the  im- 
provability  of  the  function  of  interval  discrimination.     Both  the 

12  See  pp.  46  ff . ;  54  &. 
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rapidity  and  extent  of  improvement  on  the  function  should  be 
known  for  persons  of  various  degrees  of  advancement  in  music. 
This  knowledge  could  be  used  with  great  effectiveness  in  adjusting 
the  rate  and  amount  of  instruction  to  individual  differences.  Tests 
using  the  T4  type  of  measurement  may  assist  in  such  studies. 

Obtaining  norms  for  various  racial  groups  might  constitute  a 
valuable  research  study.  Some  research  might  be  concerned  with 
the  measurement  of  certain  races  which  use  a  system  of  tonality 
different  from  the  occidental  system. 

Some  study  of  the  relationship  of  the  capacity  for  differentiating 
between  small  variations  in  pitch  frequency  and  the  ability  to  dis- 
tinguish between  musical  intervals  should  be  made.  An  analysis 
of  the  two  responses  indicates  that,  psychologically,  they  are  not  the 
same.  However,  a  correlation  in  this  study  using  a  small  sampling 
has  shown  that  there  is  a  moderate  relationship  between  the  two 
responses.  The  basis  for  these  studies  could  be  the  Seashore  test  of 
pitch  and  the  T4  test  or  a  standardized  form  of  the  same  type. 

A  very  important  phase  of  such  a  study  would  be  the  determina- 
tion of  the  more  effective  index  of  ability  for  true  intonation.  One 
prerequisite  for  the  successful  pursuit  of  the  study  would  be  a  sound 
criterion  for  correct  intonation.  In  keeping  with  the  point  of  view 
adopted  in  the  present  study,  correct  intonation  is  here  regarded  as 
a  musical  function.  Consequently  the  criterion  for  this  ability 
should  be  based  on  psychological  and  not  acoustical  grounds.  A 
criterion,  acceptable  from  this  point  of  view,  would  be  the  feeling 
and  judgment  of  the  musically  competent  mind. 

It  is  possible  that  musical,  and  presumably  artistic,  intonation 
is  determined  by  feeling  for  interval  in  its  tonal  setting.  It  is  also 
possible  that  this  sense  of  intonation  may  fluctviate  and  not  be  set 
acoustically,  especially  if  the  tonal  situation  varies  in  any  way. 
This  hypothesis  contrasts  with  the  Seashore  assumption,  which  is 
that  true  intonation  is  determined  by  a  specifically  and  acoustically 
defined  pitch  and  that  certain  pitch  variations  on  the  part  of  a 
concert  performer  constitute  artistic  deviations  from  this  standard.^^ 
Seashore,  furthermore,  does  not  specify  the  nature  of  this  acoustical 
standard,  whether  it  is  the  "just"  or  the  "tempered"  system  of 
tuning.  Future  investigation  might  show  that  a  given  note,  the 
frequency  of  which  is  defined  by  the  "just"  or  the  "tempered" 
scale,  is  the  modaP*  point  of  a  large  number  of  pitch  judgments 
of  musically  competent  persons. 

13  Raetveit,  Lewis,  ond  Seashore,  op.  cit.,  p.  45. 

14  As  distinguished  from  mean  and  median. 
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It  may  be  doubted  whether  the  Seashore  test  of  abstract  pitch 
discrimination  measures  an  important  musical  trait.  Furthermore, 
to  say  that  artists  use  this  ability  to  deviate  from  another  pitch 
standard  is  to  make  abstract  pitch  discrimination  dependent  upon 
some  absolute  pitch  standard.  This  latter  standard  is  not  defined 
in  the  Seashore  manual,  and  there  is  a  total  absence  of  any  pretense 
of  measuring  it  by  means  of  a  test  of  abstract  pitch  discrimination. 

Future  studies  should  ascertain,  if  possible,  where  the  modal 
points  for  certain  pitch  judgments  in  given  tonality  settings  are  to ' 
be  found.  These  points  might  then  constitute  criteria  for  ability  in 
intonation,  and  be  used  to  validate  various  tests  purporting  to  mea- 
sure intonation.  The  answer  to  the  problem  of  measurement  of  cor- 
rect intonation  may  well  lie  in  the  ability  to  think  in  terms  of  tonal 
relationships.  If  so,  a  study  of  this  problem  could  make  use  of  the 
present  method  of  testing  interval  discrimination.  Gardner's  en- 
tire system  of  violin  study  bears  out  this  hypothesis,  for  his  approach 
to  correct  intonation  is  through  the  development  of  habits  of  "har- 
monic thinking.  "^^ 

Tentatht:  Classification  of  the  T4  Test 

Because  of  the  exploratory  nature  of  the  T4  test  it  is  premature 
to  venture  a  complete  analysis  or  classification  of  its  status  among 
music  tests  or  among  tests  in  general.  However,  some  definite 
attributes  of  the  test  have  been  presented  in  various  sections  of  this 
work  and  are  now  drawn  together  for  the  purpose  of  identifying  the 
T4  type  of  musical  testing  in  its  relation  to  the  testing  field  as  a 
whole,  and  among  music  tests  in  particular. 

The  T4  test  is  a  test  of  aural  perception  and  must  take  its  place 
among  music  tests  measuring  that  type  of  response.  Although  it 
represents  ability  in  the  perception  of  tonal  relationships,  and  has 
many  applications,  it  should  in  no  wise  be  used  as  a  test  of  general 
musical  ability.  For  example,  the  lack  of  any  correlation  of  the 
test  with  the  Seashore  rhythm  test  makes  it  important  that  the 
former  be  supplemented  with  some  test  of  rhythm  before  an  ability 
for  tonal-rhythmic  relationships  can  be  determined. 

The  T4  test  may  also  be  identified  as  an  aptitude  test  in  the  sense 
that  it  answers  AVarren's  definition  of  an  aptitude.  Although  it  is 
not  a  test  of  innate  ability,  it  must  be  considered  as  being  made  up 
of  no  small  amount  of  this  factor.  In  another  sense  it  is  very 
definitely  an  achievement  test,  for  it  is  indicative  of  a  present  ability 
developed  through  past  experience,  environment,  and  training.    In 

iSiSee  p.  11. 
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still  another  sense  it  is  a  measure  of  musical  intelligence  and  ma- 
turity, for  it  constitutes  an  important  factor  related  to  musical 
"thinking"  of  a  high  order.  It  is  a  test  measuring  a  trait  which 
can  be  developed,  although  there  is  a  suggestion  that  there  are 
definite  limits  to  such  development  which  some  time  should  be 
determined  for  various  levels  of  initial  ability  on  the  function. 

Specifically  the  test  has  been  related  to  success  in  certain  grades 
in  musical  theory,  and  to  behavior  traits  associated  with  the  concept 
of  tonal  learning.  It  is  possible  that  other  validation  studies  will 
show  significant  correlations  with  additional  specific  criteria.  In  a 
broader  sense  it  is  a  test  of  ability  in  the  entire  field  of  tonal 
relationships. 

References  to  the  Literature  of  Research  and  Testing 

Certain  references  in  research  literature  point  to  the  need  of  a 
test  of  tonal  imagery  and  also  of  a  test  using  as  a  basis  response  to 
musical  intervals.  Seashore  has  stated  that  if  he  were  limited  to  a 
single  index  to  musical  talent  he  would  take  the  record  of  "  a  natural 
capacity  for  tonal  imagery."^''  Previous  discussion  in  these  chap- 
ters has  questioned  whether  innate  or  natural  capacity  for  this 
ability  can  be  measured,  but  it  should  be  noted  that  Seashore  has 
recognized  the  need  for  a  measure  in  this  area,  pointing  out  that 
because  of  the  demands  for  objectivity  no  test  of  this  ability  has 
existed. 

Various  aspects  of  the  perception  of  musical  intervals  have  been 
used  as  a  basis  for  tests  of  musical  ability.  Schoen's  test,  reported 
in  Chapter  II,  constitutes  an  effort  in  this  direction,  but  the  ob- 
served limitations  surrounding  this  particular  type  of  test  made  it 
appear  unsuitable  for  purposes  of  this  study. 

The  consonance  test  of  the  original  Seashore  battery  constitutes 
another  attempt  to  obtain  a  measure  of  response  on  certain  aspects 
of  interval  perception.  This  test  secured  responses  of  "like"  and 
"dislike"  on  various  paired  intervals.  The  test  was  eliminated  from 
the  revised  Seashore  battery  not  because  the  function  seemed  unim- 
portant, but  because  the  method  of  measurement  was  beset  with 
certain  difficulties.  A  quotation  from  the  monograph  which  dis- 
cusses the  revised  Seashore  tests  explains  why  the  consonance  test 
was  not  used. 

One  of  the  original  measures,  the  consonance  test,  was  eliminated  on  the 
ground  of  difficulty  in  securing  judgments  which  were  unaffected  by  harmonic 
progression,  melodic  sequence,  and  feelings  of  like  and  dislike.     There  is,  how- 

16  See  p.  21. 
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ever,  little  doubt  about  the  diagnostic  value  of  a  measure  of  consonance,  and 
some  way  of  overcoming  the  difficulties  of  measurement  will  probably  be  found 
in  the  future.i^ 

Consonance,  according  to  studies^®  in  the  psychology  of  the  musi- 
cal interval,  is  a  subjective  experience  which  has  changed  in  mean- 
ing through  years  of  musical  usage,  and  organized  psychology  has 
also  found  it  difficult  to  objectify  the  various  terms  used  in  describ- 
ing consonance.  However,  since  it  is  the  use  of  musical  intervals 
which  has  appeared  to  be  the  basis  for  a  test  of  perceptual  ability, 
there  seems  to  be  no  reason  why  the  subjective  consonance  aspects 
of  interval  quality  need  to  be  measured.  The  results  of  the  present 
study  show  that  when  emphasis  is  directed  to  objective  differences 
between  interval  qualities,  significant  and  reliable  results  in  testing 
may  be  achieved.  Reactions  on  the  multiple-response  items  used  in 
the  T4  test  are  secured  literally  in  terms  of  "objective  scrutiny "^^ 
of  interval  material  itself  and  not  in  terms  of  verbal  or  emotional 
concepts  aroused  by  feelings  of  liking  and  dislike  for  the  intervals. 

Naming  Future  Tests  Based  on  the  T4  Pattern  of  Testing 

This  work  is  brought  to  a  close  in  the  discussion  of  a  relatively 
simple  problem.  Throughout  this  entire  work  reference  has  been 
made  to  various  concepts  having  to  do  with  experiences  in  tonal 
relationships.  The  account  has  referred  to  a  method  for  obtaining 
index  measurements  in  this  area.  The  field  in  general  now  seems 
fairly  well  defined,  but  no  definite  name  has  been  given  to  the  inter- 
val tests  themselves.  They  have  been  showTi  to  be  related  to  various 
estimates  in  the  area  of  tonal  development.  These  tests  of  intervals 
are  quite  likely  in  the  future  to  be  standardized  for  certain  ability 
levels  in  music.  For  purposes  of  later  identification,  therefore,  any 
test  form  constructed  by  the  author  in  accordance  with  the  pattern 
and  method  of  measurement  developed  in  this  study  will  be  known 
as  The  Madison  Interval  Test  of  Musical  Ability. 


1"  Saetveit,  Lewis,  and  Seashore,  op.  cit.,  p.  7. 

18  Mursell,  op.  cit.,  pp.  90-91. 

19  Seashore  uses  this  term  in  describing  the  objective  nature  of  tonal  experi- 
ence.    See  p.  21. 
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